{
    "cells": [
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "<i>Copyright (c) Recommenders contributors.</i>\n",
                "\n",
                "<i>Licensed under the MIT License.</i>"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "inputHidden": false,
                "outputHidden": false
            },
            "source": [
                "# RBM Deep Dive with Tensorflow \n",
                "\n",
                "In this notebook we provide a complete walkthrough of the Restricted Boltzmann Machine (RBM) algorithm with applications to recommender systems. In particular, we use as a case study the [movielens dataset](https://movielens.org), comprising user's ranking of movies on a scale of 1 to 5. A quickstart version of this notebook can be found [here](../00_quick_start/rbm_movielens.ipynb).  \n",
                "\n",
                "### Overview \n",
                "\n",
                "A Restricted Boltzmann Machine (RBM) is a generative neural network model typically used to perform unsupervised learning. The main task of an RBM is to learn the joint probability distribution $P(v,h)$, where $v$ are the visible units and $h$ the hidden ones. The hidden units represent latent variables while the visible units are clamped on the input data. Once the joint distribution is learnt, new examples are generated by sampling from it.    \n",
                "\n",
                "The implementation presented here is based on the article by Ruslan Salakhutdinov, Andriy Mnih and Geoffrey Hinton [Restricted Boltzmann Machines for Collaborative Filtering](https://www.cs.toronto.edu/~rsalakhu/papers/rbmcf.pdf) with the exception that here we use multinomial units instead of the one-hot encoded used in the paper.  \n",
                "\n",
                "### Advantages of RBM: \n",
                "\n",
                "The model generates ratings for a user/movie pair using a collaborative filtering based approach. While matrix factorization methods learn how to reproduce an instance of the user/item affinity matrix, the RBM learns its underlying probability distribution. This has several advantages: \n",
                "\n",
                "- Generalizability : the model generalize well to new examples as long as they do not differ much in probability\n",
                "- Stability in time: if the recommendation task is time-stationary, the model does not need to be trained often to accomodate new ratings/users. \n",
                "- The tensorflow implementation presented here allows fast, scalable  training on GPU \n",
                "\n",
                "### Outline \n",
                "\n",
                "This notebook is organized as follows:\n",
                "\n",
                "1. RBM Theory \n",
                "2. Tensorflow implementation and model parameters  \n",
                "3. Data preparation and inspection\n",
                "4. Model application, performance and analysis  \n",
                "\n",
                "Sections 1 and 2 require basic knowledge of linear algebra, probability theory and tensorflow while  \n",
                "sections 3 and 4 only require some basic data science understanding. **Feel free to jump to the section you are most interested in!**"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 0 Global Settings and Import"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 1,
            "metadata": {
                "inputHidden": false,
                "outputHidden": false
            },
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": [
                        "System version: 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:21) \n",
                        "[GCC 9.4.0]\n",
                        "Pandas version: 1.3.5\n",
                        "Tensorflow version: 2.7.0\n"
                    ]
                }
            ],
            "source": [
                "import sys\n",
                "import pandas as pd\n",
                "import matplotlib.pyplot as plt\n",
                "%matplotlib inline \n",
                "\n",
                "import logging\n",
                "import numpy as np\n",
                "import tensorflow as tf\n",
                "tf.get_logger().setLevel(logging.ERROR)\n",
                "\n",
                "from recommenders.models.rbm.rbm import RBM\n",
                "from recommenders.datasets.python_splitters import numpy_stratified_split\n",
                "from recommenders.datasets.sparse import AffinityMatrix\n",
                "from recommenders.utils.timer import Timer\n",
                "from recommenders.utils.plot import line_graph\n",
                "from recommenders.datasets import movielens \n",
                "from recommenders.evaluation.python_evaluation import (\n",
                "    map_at_k,\n",
                "    ndcg_at_k,\n",
                "    precision_at_k,\n",
                "    recall_at_k,\n",
                ")\n",
                "\n",
                "#For interactive mode only\n",
                "%load_ext autoreload\n",
                "%autoreload 2\n",
                "\n",
                "print(\"System version: {}\".format(sys.version))\n",
                "print(\"Pandas version: {}\".format(pd.__version__))\n",
                "print(\"Tensorflow version: {}\".format(tf.__version__))"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 1. RBM Theory \n",
                "\n",
                "## 1.1 Overview and main differences with other recommender algorithms\n",
                "\n",
                "A Restricted Boltzmann Machine (RBM) is an undirected graphical model originally devised to study the statistical mechanics (or physics) of magnetic systems. Statistical mechanics (SM) provides a probabilistic description of complex systems made of a huge number of constituents (typically $\\sim 10^{23}$); instead of looking at a particular instance of the system, the aim of SM is to describe their **typical** behaviour. This approach has been succesfull for the description of gases, liquids, complex materials (e.g. semiconductors) and even the famous [Higgs boson](https://en.wikipedia.org/wiki/Higgs_boson)!\n",
                "\n",
                "Being designed to handle and organize a large amount of data, SM finds ideal applications in modern learning algorithms. In the context of **recommender systems**, the idea is to learn typical user behaviour instead of particular instances. To better understand this consider the most general setup of a recommendation problem: there are $m$ users rating $n$ items according to some scale (e.g. 1 to 5). In a typical scenario of online shopping, streaming services or decision processes, the user only rates a subset $l \\ll m$ of the products. If we now create a matrix representation of this problem, we obtain the user/item affinity matrix $X$. In a more readable table form, $X$ will look like this:\n",
                "\n",
                "\n",
                "|  $X$   |$i_1$  |$i_2$  |$i_3$  |  ... |$i_m$  | \n",
                "|-----|-------|-------|-------|------|-------|\n",
                "|$u_1$|5      |0      |2      |0 ... |1      |\n",
                "|$u_2$|0      |0      |3      |4 ... |0      |\n",
                "|...  |...    |...    |...    |...   |...    |\n",
                "|$u_m$|3      |3      |0      |5...  |2      |\n",
                "\n",
                "\n",
                "where the zeroes denote unrated items. In a nutshell, the recommender task is to \"fill in\" the missing ratings (later we will see that in practice this is not the only criteria used to recommend a product). The classical approach to this problem is called matrix factorization: the basic idea is to decompose $X$ into a user ($P$) and item ($Q$) matrix, such that $X = Q^T P$. The dimensions of the two matrices are $dim(Q) = (f, n)$ and $dim(P)= (f,m)$ where $f \\le m,n$ is the number of latent factors, e.g. the genre of a movie, the type of food etc... and it is an hyperparameter of the model, for more details see the [ALS notebook](../02_model/als_deep_dive.ipynb). By learning $Q$ and $P$ we try to reproduce a particular instance of $X$ (provided by the available data) and use this information to fill up the missing matrix elements. \n",
                "\n",
                "The RBM approach is to look at $X$ as a particular realization (sample) of a more general process; instead of learning a specific $X$, we try to learn the matrix distribution from which $X$ has been sampled from. Effectively, we learn the typical distribution of *tastes* (i.e. latent factors) and use this information to *generate* new ratings. For this reason, this class of neural network models is also called **generative**. Consider the following example: imagine you are given the  income distribution per age window of a particular country (this is easy to find from goverments data), then we could fix the age window and *generate* virtual citizens with various incomes by sampling from this distibution.   "
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 1.2 Model \n",
                "\n",
                "The central quantity of every SM model is the [Boltzmann distribution ](https://en.wikipedia.org/wiki/Boltzmann_distribution); this can be seen as the least biased probability distribution on a given probability space $\\Sigma$ and can be obtained using a maximum entropy principle on the space of distributions over $\\Sigma$. Its typical form is: \n",
                "\n",
                "$$P = \\frac{1}{Z} \\, e^{- \\beta \\, H},$$ \n",
                "\n",
                "where $Z$ is a normalization constant known as the partition function, $\\beta$ is a noise parameter with units of inverse energy and $H$ is the Hamiltonian, or energy function of the system. For this reason, this class of models is also known as *energy based* in computer science. In physics, $\\beta$ is the inverse temperature of the system in units of Boltzmann's constant, but here we will effectively rescale it inside $H$, so that this is now a pure number. $H$ describes the behaviour of two sets of stochastic vectors, typically called $v_i$ (visibles) and $h_j$ (hidden). The former constitute both the input *and* the ouput of the algo (this will be clear later), while the hidden units are the latent factors we want to learn. This structure results in the following Neural Network topology:\n",
                "\n",
                "![rbm1](https://raw.githubusercontent.com/recommenders-team/resources/main/images/RBM1.png)\n",
                "\n",
                "The input of the movielens database consists of ratings from 1 to 5; we shall thus consider a discrete configuration space of $m$ visible variables, each taking values in a finite set $\\chi_v = \\{ 1, 2, 3,4,5 \\}$. A global configuration of the system is determined by $\\mathbf{v} = (v_1, v_2, ..., v_m) \\in \\chi_v^m$ and we reserve $0$ for an unrated movie. We also need to specify the hidden units, that we take as random binary variables $\\chi_h = \\{0,1 \\}$ denoting if the particular unit is active or not and $\\mathbf{h} = (h_1, h_2, ...,h_n) \\in \\chi_h^n$. The hidden units may describe attributes such as the genre of a movie; for example, given a sci-fi/horror movie, only the hidden units describing such attributes should be active. The minimal model for such a system is defined by the following Hamiltonian: \n",
                "\n",
                "$$H = - \\sum_{i,j \\in G} v_i \\, w_{ij} \\, h_j - \\sum_{i=1}^m v_i \\, a_i - \\sum_{j=1}^n h_i \\, b_i$$\n",
                "\n",
                "The first term is an \"interaction term\", capturing the correlations between the visible and hidden units, while the other two terms are \"potential terms\", taking into account the bias of the units. The correlation matrix $w_{ij}$ and the two biases $a_i$ and $b_i$ are learning parameters to be fixed by the minimization of a properly defined cost function. Remember that this is an unsupervised problem, i.e. there is no real output and therefore we cannot directly minimize the error function between the prediction and the labeled data. As in every SM problem, the right quantity to minimize is the Free energy (remember that $\\beta =1$)\n",
                "\n",
                "$$ F =- \\log Z =- \\log \\sum_{ v_i, h_i } P(v, h) $$\n",
                "\n",
                "In the language of probability theory, the above quantity is the cumulant generating function. One way of evaluating the free energy is to use a [Markov-chain Montecarlo sampling](https://en.wikipedia.org/wiki/Monte_Carlo_method#Computer_graphics) algorithm such as the Metropolis-Hasting; here we will use instead an approximate method called Contrastive divergence, based on [Gibbs sampling](https://en.wikipedia.org/wiki/Gibbs_sampling) (see below). The latter has the advantage of being faster than Montecarlo. Once the candidate $F$ has been found, we fix the learning parameters by minimizing $F$. Let us see how this works in practice in the next section. "
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 1.3 Learning Algorithm \n",
                "\n",
                "Instead of sampling directly from the joint probability distribution, one can evaluate the conditional distributions   \n",
                "\n",
                "$$ P(v, h) = P(v|h) P(h) = P(h|v) P(v) $$ \n",
                "\n",
                "where the second equality follows from the fact that the model is undirected or, in physical terms, it is in equilibrium. Gibbs sampling essentially consists of two steps called **positive** and **negative** phases:\n",
                "\n",
                "### Positive \n",
                "\n",
                "**Fix the visible units on the data and evaluate $P(h_j =1| \\mathbf{v})$**, i.e. the probability that the jth hidden unit is active given the entire input vector. In practice, it is convenient to evaluate the generating function: \n",
                "\n",
                "$$ Z[v,b] = \\prod_j \\sum_{h_j = 0,1}  e^{(\\sum_i w_{ij} v_i + b_j) h_j} = \\prod_j \\left( 1+  e^{\\sum_i w_{ij} v_i + b_j} \\right).$$\n",
                "\n",
                "Taking the gradients with respect to the bias we obtain \n",
                "\n",
                "$$\\frac{\\partial}{\\partial b_j}\\log Z[v,b] =  \\frac{1}{1+ e^{-(\\sum_i w_{ij} v_i + b_j)}} = \\sigma( \\phi_j(v, b) ),$$\n",
                "\n",
                "where $\\phi_j(v,b) = \\sum_i w_{ij} v_i + b_j $ and we have identified the logistic function $\\sigma(.) \\equiv P(h_j=1|v,b)$. \n",
                "\n",
                "**Use $\\sigma$ to sample the value of $h_j$** \n",
                "\n",
                "### Negative \n",
                "\n",
                "**Use the sampled value of the hidden units to evaluate $P(v_i = q |h)$**, where $q=1,...,5$. This is given by the multinomial expression\n",
                "\n",
                "$$ P(v_i = q |h,a) =  \\prod_{v_i=1}^q e^{v_i (\\sum_j w_{ij} \\, h_j + a_i ) }/Z_q $$\n",
                "\n",
                "where $Z_q$ is the partition function evaluated over the $q$ outcomes (note that $0$ should not be included in the sum). Finally, sample the values of $v_i$ from the above distribution. Clearly, these new $v_i$ are not necessarily those we have used as an input, at least not at the beginning of training. The above steps are repeated $k$ times, where $k$ is usually increased during training according to a given protocol. \n",
                "\n",
                "At the end of each k-step Gibbs sampling, we evaluate the difference between the initial free energy at $k=0$ (given v) and the one after k-steps \n",
                "\n",
                "$$ \\Delta F = F_0 - F_k, $$\n",
                "\n",
                "and update the learning parameters $w_{ij}$, $b_i$ and $a_i$: \n",
                "\n",
                "$$ \\frac{\\partial}{\\partial b_j} \\Delta F = \\frac{\\partial}{\\partial b_j} (\\log Z_0[v,b] - \\log Z_k[v,b]) = P_0(h_j=1|v,b) - P_k(h_j=1|v,b) $$\n",
                "\n",
                "$$ \\frac{\\partial}{\\partial w_{ij} } \\Delta F = v_i \\, P_0(v_i = q|h, a) - v_i P_k(v_i| h,a) \\equiv \\langle v_i\\rangle_0 - \\langle v_i \\rangle_k. $$\n",
                "\n",
                "This process is repeated for each training epoch, eventually until $\\Delta F =0$, i.e. the learned distribution faithfully reproduces the empirical one. In this sense, $v_i$ serves both as an input and output of the model. As $w_{ij}$ contains informations on how users' votes are correlated, we can use this information to generate ratings for the unseen movies by sampling from the learned, marginal distribution:\n",
                "\n",
                "$$ \\langle v_i \\rangle = \\sum_{v_i} v_i \\, P(v) $$ \n",
                "\n",
                "The entire workflow is summarised below \n",
                "\n",
                "![gibbs](https://raw.githubusercontent.com/recommenders-team/resources/main/images/Gsampling.png)\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 2. TensorFlow implemetation and model parameters \n",
                "\n",
                "In this section we briefly describe how the algorithm is implemented in Tensorflow and which parameters can be customized by the user during training. We also discuss some best practices to be used when training the RBM model on a recommendation task. Further technical details are explained directly in the code. \n",
                "\n",
                "Tensorflow (TF) is an open source framework to develop deep learning (DL) models in a fast and efficient way. One of the shared characteristics of DL frameworks is autodifferentiation, i.e. the symbolic evaluation of gradients, that will be particulary useful here. The other advantage of TF is the generation and optimization of the symbolic operations defined on a computational graph, for fast and scalable deployment on both CPU and GPU. For more informations on TF see [here](www.tensorflow.com). Unfortunately, TF is tailor made for supervised learning tasks, so its application to unsupervised model needs some more work. Note: although TF has recently started developing a [set of libraries to perform probabilistic inference](www.tensorflow_probability.com), we found their performance still not optimal and therefore we will not use them here. \n",
                "\n",
                "\n",
                "The RBM model is instantiated as a class with several methods to build the graph, perform sampling, training and inference. The skeleton of the graph is built at the moment the class is instantiated; mandatory fields are: \n",
                "\n",
                "- `hidden_units` integer (Default =500) : number of hidden units\n",
                "\n",
                "\n",
                "- `training_epoch`integer (Default = 20): number of training epochs \n",
                "\n",
                "\n",
                "- `minibatch_size`integer (Default = 100): size of the batch to be chosen at random at each training epoch \n",
                "\n",
                "\n",
                "The optional parameters are: \n",
                "\n",
                "- `keep_prob` : float (Default = 0.7) we use dropout regularization on the hidden units, so this parameter specifies the probability of keeping the connection to a hidden unit active. Dropout will affect specific matrix elements of $w_{ij}$, decreasing in this way the model's complexity and improving generalization. \n",
                "\n",
                "\n",
                "- `sampling_protocol` : Array (Default = $[50, 70, 80,90,100]$) percentage of the entire training epochs when the the k-sampling step is increased in an annealing fashion. In the default case, the first 50% of the training epochs are sampled with a single k-step. As training converges, the number of k-steps is increased by $1$ at each percentage.\n",
                "\n",
                "\n",
                "- `debug`: Boolean (Default = False) if True, prints the output of some of the intermediate steps for inspection. \n",
                "\n",
                "\n",
                "- `with_metrics`: Boolean (Default= False) if True it evaluates, print and finally plot the mean squared root error per training epoch on the training set. At the end, it also evaluates and print the total model accuracy both on the training and test set. We suggest to switch it off only for benchmarking execution time.  \n",
                "\n",
                "\n",
                "- `init_stdv`: float (Default = 0.1) standard deviation used to inititialize the correlation matrix. \n",
                "\n",
                "\n",
                "- `learning_rate`: float (Default = 0.004) init learning rate used in the optimization algorithm. Note that the optimizer uses a different, effective learning rate scaled to the batch size $\\alpha$ = `learning_rate/minibatch_size`. \n",
                "\n",
                "\n",
                "- `display_epoch `: integer (Default = 10) the number of epochs after which the rmse error is printed out during the learning phase. \n",
                "\n",
                "\n",
                "Although optional, it is likely that `sampling_protocol` needs to be modified for different recommenders; we recommend to keep this in mind when training on a new dataset. "
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "# 3 Data preparation and inspection \n",
                "\n",
                "The MovieLens dataset comes in different sizes, denoting the number of available ratings. The number of users and rated movies also changes across the different dataset. The data are imported in a pandas dataframe including the **user ID**, the **item ID**, the **ratings** and a **timestamp** denoting when a particular user rated a particular item. Although this last feature could be explicitely included, it will not be considered here. The underlying assumption of this choice is that user's tastes are weakly time dependent, i.e. a user's taste typically chage on time scales (usually years) much longer than the typical recommendation time scale (e.g. hours/days). As a consequence, the joint probability distribution we want to learn can be safely considered as time dependent. Nevertheless, timestamps could be used as *contextual variables*, e.g. recommend a certain movie during the weekend and another during weekdays.  \n",
                "\n",
                "Below, we first load the different movielens data in pandas dataframes, explain how the user/affinity matrix is built and how the train/test set is generated. As this procedure is common to all the datasets considered here, we explain it in details only for the 1m dataset.  \n",
                "\n",
                "We start with downloading the different datasets:"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 2,
            "metadata": {
                "inputHidden": false,
                "outputHidden": false
            },
            "outputs": [
                {
                    "name": "stderr",
                    "output_type": "stream",
                    "text": [
                        "100%|██████████| 4.81k/4.81k [00:00<00:00, 30.6kKB/s]\n"
                    ]
                },
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>userID</th>\n",
                            "      <th>movieID</th>\n",
                            "      <th>rating</th>\n",
                            "      <th>timestamp</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>196</td>\n",
                            "      <td>242</td>\n",
                            "      <td>3.0</td>\n",
                            "      <td>881250949</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>186</td>\n",
                            "      <td>302</td>\n",
                            "      <td>3.0</td>\n",
                            "      <td>891717742</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>22</td>\n",
                            "      <td>377</td>\n",
                            "      <td>1.0</td>\n",
                            "      <td>878887116</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>3</th>\n",
                            "      <td>244</td>\n",
                            "      <td>51</td>\n",
                            "      <td>2.0</td>\n",
                            "      <td>880606923</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>4</th>\n",
                            "      <td>166</td>\n",
                            "      <td>346</td>\n",
                            "      <td>1.0</td>\n",
                            "      <td>886397596</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "   userID  movieID  rating  timestamp\n",
                            "0     196      242     3.0  881250949\n",
                            "1     186      302     3.0  891717742\n",
                            "2      22      377     1.0  878887116\n",
                            "3     244       51     2.0  880606923\n",
                            "4     166      346     1.0  886397596"
                        ]
                    },
                    "execution_count": 2,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "MOVIELENS_DATA_SIZE = '100k'\n",
                "\n",
                "mldf_100k = movielens.load_pandas_df(\n",
                "    size=MOVIELENS_DATA_SIZE,\n",
                "    header=['userID','movieID','rating','timestamp']\n",
                ") \n",
                "\n",
                "mldf_100k.head()"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 3,
            "metadata": {
                "inputHidden": false,
                "outputHidden": false
            },
            "outputs": [
                {
                    "name": "stderr",
                    "output_type": "stream",
                    "text": [
                        "100%|██████████| 5.78k/5.78k [00:00<00:00, 43.3kKB/s]\n"
                    ]
                },
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>userID</th>\n",
                            "      <th>movieID</th>\n",
                            "      <th>rating</th>\n",
                            "      <th>timestamp</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>1</td>\n",
                            "      <td>1193</td>\n",
                            "      <td>5.0</td>\n",
                            "      <td>978300760</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>1</td>\n",
                            "      <td>661</td>\n",
                            "      <td>3.0</td>\n",
                            "      <td>978302109</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>1</td>\n",
                            "      <td>914</td>\n",
                            "      <td>3.0</td>\n",
                            "      <td>978301968</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>3</th>\n",
                            "      <td>1</td>\n",
                            "      <td>3408</td>\n",
                            "      <td>4.0</td>\n",
                            "      <td>978300275</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>4</th>\n",
                            "      <td>1</td>\n",
                            "      <td>2355</td>\n",
                            "      <td>5.0</td>\n",
                            "      <td>978824291</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "   userID  movieID  rating  timestamp\n",
                            "0       1     1193     5.0  978300760\n",
                            "1       1      661     3.0  978302109\n",
                            "2       1      914     3.0  978301968\n",
                            "3       1     3408     4.0  978300275\n",
                            "4       1     2355     5.0  978824291"
                        ]
                    },
                    "execution_count": 3,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "MOVIELENS_DATA_SIZE = '1m'\n",
                "\n",
                "mldf_1m = movielens.load_pandas_df(\n",
                "    size=MOVIELENS_DATA_SIZE,\n",
                "    header=['userID','movieID','rating','timestamp']\n",
                ")\n",
                "\n",
                "mldf_1m.head()"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "### 3.1 Split the data using the stratified splitter  \n",
                "\n",
                "As a second step, we split the data into train and test set by mantaining the same matrix size. Clearly, the two matrices will contain different ratings in different proportions.  \n",
                "\n",
                "- First, we use the `AffinityMatrix` class to generate the $(m,n)$ user/affinity matrix $X$ defined in section **1.1**; this also returns the sparseness percentage. For example, for the 1m dataset, $95$ % of the matrix entries are zeros. This represents a challenge for the learning task: fixing $95$ % of entries with only $5$ % of data points. \n",
                "\n",
                "- Second, use the `numpy_stratified_split()` to split $X$ into train and test set. By default, we choose a $75$% to $25$% ratio. The split function selects, for every user, $25$ % of rated movies and it moves them in the new test matrix. This way of splitting the data makes sure the rating distribution remains the same across the train/test set, both locally (user-wise) and globally. If you consider the user/item matrix $X$ defined above, we would have\n",
                "\n",
                "### Train\n",
                "\n",
                "|  $X_{tr}$   |$i_1$  |$i_2$  |$i_3$  |  $...$ |$i_n$  |    \n",
                "|-----|-------|-------|-------|--------|-------|\n",
                "|$u_1$|$0$    |$0$    |$2$    |$0...$  |$0$    |\n",
                "|$u_2$|$0$    |$0$    |$3$    |$0...$  |$0$    |\n",
                "|$...$|$...$  |$...$  |$...$  |$...$   |$...$  |\n",
                "|$u_m$|$3$    |$0$    |$0$    |$0...$  |$2$    |\n",
                "\n",
                "\n",
                "### Test \n",
                "\n",
                "| $X_{tst}$    |$i_1$  |$i_2$  |$i_3$  |  ... |$i_n$  | \n",
                "|-----|-------|-------|-------|------|-------|\n",
                "|$u_1$|5      |0      |0      |0 ... |1      |\n",
                "|$u_2$|0      |0      |0      |4 ... |0      |\n",
                "|...  |...    |...    |...    |...   |...    |\n",
                "|$u_m$|0      |3      |0      |5...  |0      |\n",
                "\n",
                "The Train and Test matrices have exactly the same dimensions (i.e. same numbers of users and movies) but contain different ratings. Once the model is trained, at inference time, we use the test set user vectors to obtain the inferred values for the ratings."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 4,
            "metadata": {},
            "outputs": [],
            "source": [
                "#to use standard names across the analysis \n",
                "header = {\n",
                "        \"col_user\": \"userID\",\n",
                "        \"col_item\": \"movieID\",\n",
                "        \"col_rating\": \"rating\",\n",
                "    }"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 5,
            "metadata": {},
            "outputs": [],
            "source": [
                "#instantiate the splitter \n",
                "am1m = AffinityMatrix(df = mldf_1m, **header)\n",
                "\n",
                "#obtain the sparse matrix \n",
                "X1m, _, _ = am1m.gen_affinity_matrix()"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "Next, we split the matrix above into train and test set sparse matrices "
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 6,
            "metadata": {},
            "outputs": [],
            "source": [
                "Xtr_1m, Xtst_1m = numpy_stratified_split(X1m)"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "It is useful to inspect the distribution of ratings in the test/train matrix to make sure that the splitter keeps it constant. We can inspect this by plotting the normalized histograms"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 7,
            "metadata": {
                "inputHidden": false,
                "outputHidden": false
            },
            "outputs": [
                {
                    "data": {
                        "text/plain": [
                            "[Text(0.5, 0, 'ratings'), Text(0, 0.5, 'density')]"
                        ]
                    },
                    "execution_count": 7,
                    "metadata": {},
                    "output_type": "execute_result"
                },
                {
                    "data": {
                        "image/png": "iVBORw0KGgoAAAANSUhEUgAAAmEAAAFNCAYAAABIc7ibAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAWLUlEQVR4nO3dfbBnd10f8PeHXShCgoDZwUgCGzDVWZQKs4ZOaUEQnWzBhApqqLFAw6RYYnGgU5bioNCh8uDw4JC2BgF5KI0UpM2Y5cFqkOLwkE0IkRAzhBDMRpEFgSQS8vjpH/e3zu2yu/nt3Xvu93fvfb1m7uSc7znndz8ns/OZ9/2e8zunujsAAKyte40uAABgMxLCAAAGEMIAAAYQwgAABhDCAAAGEMIAAAYQwli3quqDVfXs0XUAwEoIYaypqrpl2c/dVXXrsvVfPJrP6u5d3f2OqWoFOJzV7GWzz/toVT1vilpZXFtHF8Dm0t3HHViuquuTPK+7/8/B+1XV1u6+cy1rA5jXvL0MjsRMGAuhqn6iqvZV1Uuq6itJ3l5VD6qqP6yq/VX1jdnyScuO+fu/HKvqOVX18ar6rdm+X6qqXcNOCNiUqupeVbW7qr5YVV+vqvdW1YNn2+5bVe+ejX+zqi6tqodU1auS/LMkb57NpL157FmwVoQwFsn3J3lwkocnOTdL/z7fPlt/WJJbkxypOT0uyTVJTkjy2iRvraqasmCAg/xKkqcneWKSH0jyjSTnz7Y9O8n3Jjk5yfcleX6SW7v7ZUn+b5Lzuvu47j5vrYtmDCGMRXJ3kl/v7tu6+9bu/np3v7+7v93dNyd5VZYa2+F8ubvf0t13JXlHkhOTPGQN6gY44PlJXtbd+7r7tiS/keSZVbU1yR1ZCl8/2N13dfdl3X3TwFoZzD1hLJL93f2dAytVdb8kb0hyepIHzYaPr6ots6B1sK8cWOjub88mwY47xH4AU3l4kg9U1d3Lxu7K0h+E78rSLNiFVfXAJO/OUmC7Y82rZCGYCWOR9EHrL07yQ0ke190PSPKE2bhLjMCiuiHJru5+4LKf+3b3jd19R3e/ort3JPknSZ6W5F/Njju4/7EJCGEssuOzdB/YN2c3tv764HoA7sl/S/Kqqnp4klTVtqo6c7b8pKr60arakuSmLF2ePDBj9jdJHjGiYMYRwlhkb0zyPUm+luSTST40tBqAe/amJBcl+UhV3Zyl3vW42bbvT/K+LAWwq5P8aZYuUR447pmzb3f/9tqWzCjVbQYUAGCtmQkDABhACAMAGEAIAwAYQAgDABhACAMAGGDdPTH/hBNO6O3bt48uA1hDl1122de6e9voOo6V/gWbz5H617oLYdu3b8/evXtHlwGsoar68ugaVoP+BZvPkfqXy5EAAAMIYQAAAwhhAAADCGEAAAMIYQAAAwhhAAADCGEAAAMIYQAAAwhhAAADCGEAAAMIYQAAA6y7d0fCFLbvvnh0CSty/aufOroEYDD9a/0yEwYAMIAQBgAwgBAGADCAEAYAMIAQBgAwgBAGADCAEAYAMIAQBgAwgBAGADCAEAYAMIAQBgAwgBAGADCAEAYAMIAQBgAwgBAGADCAEAYAMIAQBgAwwKQhrKpOr6prquraqtp9hP2eUVVdVTunrAcAYFFMFsKqakuS85PsSrIjybOqasch9js+yQuTfGqqWgAAFs2UM2GnJbm2u6/r7tuTXJjkzEPs95+SvCbJdyasBQBgoUwZwh6a5IZl6/tmY3+vqh6b5OTuvvhIH1RV51bV3qrau3///tWvFGAi+hdwOMNuzK+qeyV5fZIX39O+3X1Bd+/s7p3btm2bvjiAVaJ/AYczZQi7McnJy9ZPmo0dcHySH0ny0aq6Psk/TnKRm/MBgM1gyhB2aZJTq+qUqrpPkrOSXHRgY3d/q7tP6O7t3b09ySeTnNHdeyesCQBgIUwWwrr7ziTnJflwkquTvLe7r6qqV1bVGVP9XgCA9WDrlB/e3XuS7Dlo7OWH2fcnpqwFAGCReGI+AMAAQhgAwABCGADAAEIYAMAAQhgAwABCGADAAEIYAMAAQhgAwABCGADAAEIYAMAAQhgAwABCGADAAEIYAMAAQhgAwABCGADAAEIYAMAAQhgAwABCGADAAEIYAMAAQhgAwABCGADAAEIYAMAAQhgAwABCGADAAEIYAMAAQhgAwABCGADAAEIYAMAAQhgAwABCGADAAEIYAMAAQhgAwABCGADAAEIYAMAAQhgAwABCGADAAEIYAMAAQhgAwABbRxfAxrJ998WjSwBYEf2LtWYmDABgACEMAGAAIQwAYAAhDABgACEMAGAAIQwAYAAhDABgACEMAGAAIQwAYAAhDABgACEMAGCASUNYVZ1eVddU1bVVtfsQ259fVX9eVVdU1ceraseU9QAALIrJQlhVbUlyfpJdSXYkedYhQtZ7uvtHu/vHkrw2yeunqgcAYJFMORN2WpJru/u67r49yYVJzly+Q3fftGz1/kl6wnoAABbG1gk/+6FJbli2vi/J4w7eqapekORFSe6T5MkT1gMAsDCG35jf3ed39yOTvCTJrx1qn6o6t6r2VtXe/fv3r22BAMdA/wIOZ8oQdmOSk5etnzQbO5wLkzz9UBu6+4Lu3tndO7dt27Z6FQJMTP8CDmfKEHZpklOr6pSquk+Ss5JctHyHqjp12epTk3xhwnoAABbGZPeEdfedVXVekg8n2ZLkbd19VVW9Msne7r4oyXlV9ZQkdyT5RpJnT1UPAMAimfLG/HT3niR7Dhp7+bLlF075+wEAFtXwG/MBADYjIQwAYAAhDABgACEMAGAAIQwAYIBJvx0JTGv77otHl7Ai17/6qaNLAAZbr/0rWb0eZiYMAGAAIQwAYAAhDABgACEMAGAAIQwAYAAhDABgACEMAGAAIQwAYAAhDABgACEMAGAAIQwAYAAhDABgACEMAGAAIQwAYIC5QlhV/UxVCWwAy+iNwLGYt3n8QpIvVNVrq+qHpywIYB3RG4EVmyuEdffZSR6T5ItJfq+qPlFV51bV8ZNWB7DA9EbgWMw9jd7dNyV5X5ILk5yY5F8kubyqfmWi2gAWnt4IrNS894SdWVUfSPLRJPdOclp370ryj5K8eLryABaX3ggci61z7vezSd7Q3R9bPtjd366qc1a/LIB1QW8EVmzey5FfObjJVNVrkqS7/3jVqwJYH/RGYMXmDWE/dYixXatZCMA6pDcCK3bEy5FV9ctJ/m2SR1bVlcs2HZ/kz6YsDGBR6Y3Aarine8Lek+SDSX4zye5l4zd3999OVhXAYtMbgWN2TyGsu/v6qnrBwRuq6sGaDbBJ6Y3AMZtnJuxpSS5L0klq2bZO8oiJ6gJYZHojcMyOGMK6+2mz/56yNuUALD69EVgN8z6s9fFVdf/Z8tlV9fqqeti0pQEsNr0ROBbzPqLivyb5dlUdeAr0F5O8a7KqANYHvRFYsXlD2J3d3UnOTPLm7j4/S1/FBtjM9EZgxeZ9bdHNVfXSJGcneUJV3StL70kD2Mz0RmDF5p0J+4UktyU5p7u/kuSkJK+brCqA9UFvBFZsrpmwWXN5/bL1v0zyzqmKAlgP9EbgWMz77cifraovVNW3quqmqrq5qm6aujiARaY3Asdi3nvCXpvkZ7r76imLAVhn9EZgxea9J+xvNBmA76I3Ais270zY3qr6/ST/K0s3oSZJuvsPpigKYJ3QG4EVmzeEPSDJt5P89LKxTqLRAJuZ3gis2Lzfjnzu1IUArDd6I3As5v125D+sqj+uqs/N1h9dVb82bWkAi01vBI7FvDfmvyXJS5PckSTdfWWSs6YqCmCd0BuBFZs3hN2vuz990Nidq10MwDqjNwIrNm8I+1pVPTJLN5ymqp6Z5K8nqwpgfdAbgRWb99uRL0hyQZIfrqobk3wpyS9OVhXA+qA3Ait2xBBWVS9atronySVZmj37uyTPyLJ3ph3m+NOTvCnJliS/292vPsTnPy9L0/f7k/zr7v7yUZ4DwJo61t4IkNzz5cjjZz87k/xykgcleWCS5yd57JEOrKotSc5PsivJjiTPqqodB+32mSQ7u/vRSd6XpVeAACy6FfdGgAOOOBPW3a9Ikqr6WJLHdvfNs/XfSHLxPXz2aUmu7e7rZsdcmOTMJJ9f9vmXLNv/k0nOPsr6AdbcMfZGgCTz35j/kCS3L1u/fTZ2JA9NcsOy9X2zscM5J8kH56wHYBGspDcCJJn/xvx3Jvl0VX1gtv70JL+3WkVU1dlZmtZ/4mG2n5vk3CR52MMetlq/FuBY3WNv1L+Aw5lrJqy7X5XkuUm+Mft5bnf/5j0cdmOSk5etnzQb+/9U1VOSvCzJGd1928HbZ7//gu7e2d07t23bNk/JAJObpzfqX8DhzDsTlu6+PMnlR/HZlyY5tapOyVL4OivJv1y+Q1U9JsnvJDm9u796FJ8NsBBW0BsBksx/T9hR6+47k5yX5MNJrk7y3u6+qqpeWVVnzHZ7XZLjkvzPqrqiqi6aqh4AgEUy90zYSnT3niw9Q2f52MuXLT9lyt8PALCoJpsJAwDg8IQwAIABhDAAgAGEMACAAYQwAIABhDAAgAGEMACAAYQwAIABhDAAgAGEMACAAYQwAIABhDAAgAGEMACAAYQwAIABhDAAgAGEMACAAYQwAIABhDAAgAGEMACAAYQwAIABhDAAgAGEMACAAYQwAIABhDAAgAGEMACAAYQwAIABhDAAgAGEMACAAYQwAIABhDAAgAGEMACAAYQwAIABhDAAgAGEMACAAbaOLoBD27774tElAKyI/gXzMRMGADCAEAYAMIAQBgAwgBAGADCAEAYAMIAQBgAwgBAGADCAEAYAMIAQBgAwgBAGADCAEAYAMIAQBgAwgBAGADCAEAYAMIAQBgAwgBAGADDApCGsqk6vqmuq6tqq2n2I7U+oqsur6s6qeuaUtQAALJLJQlhVbUlyfpJdSXYkeVZV7Thot79M8pwk75mqDgCARbR1ws8+Lcm13X1dklTVhUnOTPL5Azt09/WzbXdPWAcAwMKZ8nLkQ5PcsGx932zsqFXVuVW1t6r27t+/f1WKA1gL+hdwOOvixvzuvqC7d3b3zm3bto0uB2Bu+hdwOFOGsBuTnLxs/aTZGADApjdlCLs0yalVdUpV3SfJWUkumvD3AQCsG5OFsO6+M8l5ST6c5Ook7+3uq6rqlVV1RpJU1Y9X1b4kP5fkd6rqqqnqAQBYJFN+OzLdvSfJnoPGXr5s+dIsXaYEANhU1sWN+QAAG40QBgAwgBAGADCAEAYAMIAQBgAwgBAGADCAEAYAMIAQBgAwgBAGADCAEAYAMIAQBgAwgBAGADDApC/wXgTbd188ugSAFdG/YGMzEwYAMIAQBgAwgBAGADCAEAYAMIAQBgAwgBAGADCAEAYAMIAQBgAwgBAGADCAEAYAMIAQBgAwgBAGADCAEAYAMIAQBgAwgBAGADCAEAYAMIAQBgAwgBAGADCAEAYAMIAQBgAwgBAGADCAEAYAMIAQBgAwgBAGADCAEAYAMIAQBgAwgBAGADCAEAYAMIAQBgAwgBAGADCAEAYAMIAQBgAwgBAGADCAEAYAMIAQBgAwgBAGADCAEAYAMMCkIayqTq+qa6rq2qrafYjt/6Cqfn+2/VNVtX3KegAAFsVkIayqtiQ5P8muJDuSPKuqdhy02zlJvtHdP5jkDUleM1U9AACLZMqZsNOSXNvd13X37UkuTHLmQfucmeQds+X3JfnJqqoJawIAWAhThrCHJrlh2fq+2dgh9+nuO5N8K8n3TVgTAMBC2Dq6gHlU1blJzp2t3lJV1xzF4Sck+drqV7VQNsM5Js5zw6jXHPU5PnyqWqamf81lM5znZjjHZJOc51H2sMP2rylD2I1JTl62ftJs7FD77KuqrUm+N8nXD/6g7r4gyQUrKaKq9nb3zpUcu15shnNMnOdGshnO8QD9655thvPcDOeYOM+jNeXlyEuTnFpVp1TVfZKcleSig/a5KMmzZ8vPTPIn3d0T1gQAsBAmmwnr7jur6rwkH06yJcnbuvuqqnplkr3dfVGStyZ5V1Vdm+RvsxTUAAA2vEnvCevuPUn2HDT28mXL30nyc1PWkBVeBlhnNsM5Js5zI9kM57gaNsv/p81wnpvhHBPneVTK1T8AgLXntUUAAANs2BBWVW+rqq9W1edG1zKVqjq5qi6pqs9X1VVV9cLRNU2hqu5bVZ+uqs/OzvMVo2uaSlVtqarPVNUfjq5lKlV1fVX9eVVdUVV7R9eziPSvjUP/2lhWu39t2MuRVfWEJLckeWd3/8joeqZQVScmObG7L6+q45NcluTp3f35waWtqtlbFO7f3bdU1b2TfDzJC7v7k4NLW3VV9aIkO5M8oLufNrqeKVTV9Ul2dveGf5bQSulfG4f+tbGsdv/asDNh3f2xLH3jcsPq7r/u7stnyzcnuTrf/VaCda+X3DJbvffsZ8P99VBVJyV5apLfHV0LY+lfG4f+xZFs2BC22VTV9iSPSfKpwaVMYjbNfUWSryb5o+7eiOf5xiT/Icndg+uYWif5SFVdNnuaPJuc/rUhvDH611ETwjaAqjouyfuT/Gp33zS6nil0913d/WNZevPCaVW1oS7RVNXTkny1uy8bXcsa+Kfd/dgku5K8YHbpjU1K/1r/9K+VE8LWudk9Bu9P8t+7+w9G1zO17v5mkkuSnD64lNX2+CRnzO43uDDJk6vq3WNLmkZ33zj771eTfCDJaWMrYhT9a8PQv1ZICFvHZjd8vjXJ1d39+tH1TKWqtlXVA2fL35Pkp5L8xdCiVll3v7S7T+ru7Vl6c8SfdPfZg8tadVV1/9lN2Kmq+yf56SQb9huAHJ7+tXHoXyu3YUNYVf2PJJ9I8kNVta+qzhld0wQen+SXsvRXxxWzn38+uqgJnJjkkqq6MkvvJP2j7t6wX4He4B6S5ONV9dkkn05ycXd/aHBNC0f/2lD0r41j1fvXhn1EBQDAItuwM2EAAItMCAMAGEAIAwAYQAgDABhACAMAGEAIY6FU1a9W1f2Wre858IwdgEWnh3E0PKKCNTd7SGN193e9Y2y131APsNr0MFaLmTDWRFVtr6prquqdWXrC8Furam9VXVVVr5jt8++S/ECWHmx4yWzs+qo6YXb81VX1ltkxH5k9fTpV9eNVdeXsYY+vq6rPzcYfVVWfno1fWVWnjjl7YL3Tw5iCEMZaOjXJf+nuRyV5cXfvTPLoJE+sqkd3928n+askT+ruJx3m+PNnx38zyTNm429P8m9mL8i9a9n+z0/yptn4ziT7Vv+UgE1ED2NVCWGspS939ydnyz9fVZcn+UySRyXZMcfxX+ruK2bLlyXZPrvX4vju/sRs/D3L9v9Ekv9YVS9J8vDuvvVYTwDY1PQwVpUQxlr6uySpqlOS/PskP9ndj05ycZL7znH8bcuW70qy9Ug7d/d7kpyR5NYke6rqySspGmBGD2NVCWGM8IAsNbNvVdVDkuxatu3mJMfP+0Hd/c0kN1fV42ZDZx3YVlWPSHLd7BLB/87SZQOAY6WHsSqOmMJhCt392ar6TJK/SHJDkj9btvmCJB+qqr86zD0Vh3JOkrdU1d1J/jTJt2bjP5/kl6rqjiRfSfKfV+UEgE1ND2O1eEQF615VHdfdt8yWdyc5sbtfOLgsgLnoYZuXmTA2gqdW1Uuz9O/5y0meM7YcgKOih21SZsIAAAZwYz4AwABCGADAAEIYAMAAQhgAwABCGADAAEIYAMAA/w/RzRawpzahEwAAAABJRU5ErkJggg==",
                        "text/plain": [
                            "<Figure size 720x360 with 2 Axes>"
                        ]
                    },
                    "metadata": {
                        "needs_background": "light"
                    },
                    "output_type": "display_data"
                }
            ],
            "source": [
                "_, (ax1m, ax2m) = plt.subplots(1, 2, sharey=True, figsize=(10,5))\n",
                "ax1m.hist(Xtr_1m[Xtr_1m !=0], 5, density= True)\n",
                "ax1m.set_title('Train')\n",
                "ax1m.set(xlabel=\"ratings\", ylabel=\"density\")\n",
                "ax2m.hist(Xtst_1m[Xtst_1m !=0], 5, density= True)\n",
                "ax2m.set_title('Test')\n",
                "ax2m.set(xlabel=\"ratings\", ylabel=\"density\")"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "We now repeat the same operations for the other datasets"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 8,
            "metadata": {
                "inputHidden": false,
                "outputHidden": false
            },
            "outputs": [],
            "source": [
                "#100k\n",
                "am100k = AffinityMatrix(df = mldf_100k, **header)\n",
                "X100k, _, _= am100k.gen_affinity_matrix()\n",
                "Xtr_100k, Xtst_100k = numpy_stratified_split(X100k)"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 9,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/plain": [
                            "[Text(0.5, 0, 'ratings'), Text(0, 0.5, 'density')]"
                        ]
                    },
                    "execution_count": 9,
                    "metadata": {},
                    "output_type": "execute_result"
                },
                {
                    "data": {
                        "image/png": "iVBORw0KGgoAAAANSUhEUgAAAmcAAAFNCAYAAABFbcjcAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAdEklEQVR4nO3df7Bfd13n8eeL1ILSosXeQWyaJmBQW2HBvYSdZamiFNIFm65WiVq3OHWydRvFAWdJF6dInCrgDqJDXChLFHBrRCp6xwYrSsFFKc3tD4sJZkhDoYkgoS202NI27Xv/+J7Al8tN8s3NPfl+7r3Px8ydnPM55/O979PpvOd1P+d7vt9UFZIkSWrD48ZdgCRJkr7OcCZJktQQw5kkSVJDDGeSJEkNMZxJkiQ1xHAmSZLUEMOZFqUkH0hyybjrkCTpWBnO1IwkXxn6eSzJg0P7P3ssr1VV51fVu/qqVZIOZz57Wfd6H07yC33UqjadNO4CpEOq6pRD20nuBH6hqv5m5nlJTqqqgyeyNkka1ai9TDocV87UvCQ/nGRfktck+TzwB0lOS/KXSQ4kubfbXj4052t/aSZ5RZKPJvlf3bmfTnL+2C5I0pKU5HFJNiW5I8ndSd6b5MndsSck+aNu/EtJdiR5SpKrgBcAb+1W3t463qvQiWA400LxXcCTgbOADQz+3/2Dbn8F8CBwpKb1PGA3cDrwJuCdSdJnwZI0wy8BFwI/BHw3cC+wpTt2CfDtwJnAdwKXAQ9W1WuB/wdsrKpTqmrjiS5aJ57hTAvFY8Drquqhqnqwqu6uqmur6oGquh+4ikHDO5zPVNU7qupR4F3AU4GnnIC6JemQy4DXVtW+qnoI+HXgoiQnAY8wCGXfU1WPVtXNVXXfGGvVGPmeMy0UB6rqq4d2knwb8DvAWuC0bvjUJMu6ADbT5w9tVNUD3aLZKbOcJ0l9OQt4f5LHhsYeZfCH4nsYrJptS/IdwB8xCHKPnPAqNXaunGmhqBn7rwa+F3heVT0JOLcb91alpFbdBZxfVd8x9POEqtpfVY9U1eur6mzgPwIvA/5rN29m/9MiZzjTQnUqg/eZfal7Q+3rxlyPJB3N24CrkpwFkGQiybpu+4VJnplkGXAfg9uch1bY/hV42jgK1ngYzrRQvQX4VuCLwI3AX421Gkk6ut8FpoC/TnI/g971vO7YdwHvYxDMPgl8hMGtzkPzLuqeNv+9E1uyxiFVrpZKkiS1wpUzSZKkhhjOJEmSGmI4kyRJaojhTJIkqSGGM0mSpIYsmm8IOP3002vlypXjLkPSCXTzzTd/saomxl3HfLCHSUvLkfrXoglnK1euZHp6etxlSDqBknxm3DXMF3uYtLQcqX95W1OSJKkhhjNJkqSGGM4kSZIaYjiTJElqiOFMkiSpIYYzSZKkhhjOJEmSGmI4kyRJaojhTJIkqSGGM0mSpIYYziRJkhqyaL5bU+rDyk3XjbuEObnzDS8ddwmSpDly5UySJKkhrpxJkrRIufq/MLlyJkmS1BDDmSRJUkMMZ5IkSQ0xnEmSJDXEcCZJktQQw5kkSVJDDGeSJEkNMZxJkiQ1xHAmSZLUEMOZJElSQ3oNZ0nWJtmdZE+STUc47yeSVJLJobErunm7k7ykzzolSZJa0dt3ayZZBmwBzgP2ATuSTFXVrhnnnQq8Evj40NjZwHrgHOC7gb9J8oyqerSveiVJklrQ58rZGmBPVe2tqoeBbcC6Wc77DeCNwFeHxtYB26rqoar6NLCnez1JkqRFrc9wdgZw19D+vm7sa5L8IHBmVV13rHMlSZIWo7E9EJDkccCbgVcfx2tsSDKdZPrAgQPzV5wknQD2MEmz6TOc7QfOHNpf3o0dcirwA8CHk9wJ/Adgqnso4GhzAaiqq6tqsqomJyYm5rl8SeqXPUzSbPoMZzuA1UlWJTmZwRv8pw4drKovV9XpVbWyqlYCNwIXVNV0d976JI9PsgpYDdzUY62SJElN6O1pzao6mGQjcD2wDNhaVTuTbAamq2rqCHN3JnkvsAs4CFzuk5qSJGkp6C2cAVTVdmD7jLErD3PuD8/Yvwq4qrfiJEmSGuQ3BEiSJDXEcCZJktQQw5kkSVJDDGeSJEkNMZxJkiQ1xHAmSZLUEMOZJElSQwxnkiRJDTGcSZIkNcRwJkmS1BDDmSRJUkMMZ5IkSQ0xnEmSJDXEcCZJktQQw5kkSVJDDGeSJEkNMZxJkiQ1xHAmSZLUEMOZJElSQwxnkiRJDTGcSZIkNaTXcJZkbZLdSfYk2TTL8cuSfCLJbUk+muTsbnxlkge78duSvK3POiVJklpxUl8vnGQZsAU4D9gH7EgyVVW7hk67pqre1p1/AfBmYG137I6qenZf9UmSJLWoz5WzNcCeqtpbVQ8D24B1wydU1X1Du08Eqsd6JEmSmtdnODsDuGtof1839g2SXJ7kDuBNwC8PHVqV5NYkH0nygh7rlCRJasbYHwioqi1V9XTgNcCvdcOfA1ZU1XOAVwHXJHnSzLlJNiSZTjJ94MCBE1e0JM0De5ik2fQZzvYDZw7tL+/GDmcbcCFAVT1UVXd32zcDdwDPmDmhqq6uqsmqmpyYmJivuiXphLCHSZpNn+FsB7A6yaokJwPrganhE5KsHtp9KfCpbnyie6CAJE8DVgN7e6xVkiSpCb09rVlVB5NsBK4HlgFbq2pnks3AdFVNARuTvAh4BLgXuKSbfi6wOckjwGPAZVV1T1+1SpIktaK3cAZQVduB7TPGrhzafuVh5l0LXNtnbZIkSS0a+wMBkiRJ+rpeV86kQ1Zuum7cJUiStCC4ciZJktQQV84kSToKV/91IrlyJkmS1BDDmSRJUkMMZ5IkSQ0xnEmSJDXEcCZJktQQw5kkSVJDDGeSJEkNMZxJkiQ1xHAmSZLUEMOZJElSQwxnkiRJDTGcSZIkNcRwJkmS1BDDmSRJUkMMZ5IkSQ0xnEmSJDXEcCZJktQQw5kkSVJDeg1nSdYm2Z1kT5JNsxy/LMknktyW5KNJzh46dkU3b3eSl/RZpyRJUit6C2dJlgFbgPOBs4GfHg5fnWuq6plV9WzgTcCbu7lnA+uBc4C1wO93rydJkrSo9blytgbYU1V7q+phYBuwbviEqrpvaPeJQHXb64BtVfVQVX0a2NO9niRJ0qLWZzg7A7hraH9fN/YNklye5A4GK2e/fIxzNySZTjJ94MCBeStckk4Ee5ik2Yz9gYCq2lJVTwdeA/zaMc69uqomq2pyYmKinwIlqSf2MEmz6TOc7QfOHNpf3o0dzjbgwjnOlSRJWhT6DGc7gNVJViU5mcEb/KeGT0iyemj3pcCnuu0pYH2SxydZBawGbuqxVkmSpCac1NcLV9XBJBuB64FlwNaq2plkMzBdVVPAxiQvAh4B7gUu6ebuTPJeYBdwELi8qh7tq1ZJkqRW9BbOAKpqO7B9xtiVQ9uvPMLcq4Cr+qtOkiSpPWN/IECSJElfZziTJElqiOFMkiSpIYYzSZKkhhjOJEmSGmI4kyRJaojhTJIkqSGGM0mSpIYYziRJkhpiOJMkSWqI4UySJKkhhjNJkqSGGM4kSZIactK4C5AkSRq2ctN14y5hTu58w0vn5XVcOZMkSWqI4UySJKkh3taUFqGFeksA5u+2gCQtVK6cSZIkNcRwJkmS1BDDmSRJUkMMZ5IkSQ3pNZwlWZtkd5I9STbNcvxVSXYluT3J3yY5a+jYo0lu636m+qxTkiSpFb09rZlkGbAFOA/YB+xIMlVVu4ZOuxWYrKoHkvwi8Cbg5d2xB6vq2X3VJ0mS1KKRVs6S/FiSY11lWwPsqaq9VfUwsA1YN3xCVd1QVQ90uzcCy4/xd0hSc+bYMyUJGP225suBTyV5U5LvG3HOGcBdQ/v7urHDuRT4wND+E5JMJ7kxyYUj/k5JasFceqYkASOGs6q6GHgOcAfwh0k+lmRDklPno4gkFwOTwG8PDZ9VVZPAzwBvSfL0WeZt6ALc9IEDB+ajFEk6bqP2THuYpNmMvOxeVfcB72Nwe/KpwH8BbknyS4eZsh84c2h/eTf2DZK8CHgtcEFVPTT0+/Z3/+4FPsyg0c2s6eqqmqyqyYmJiVEvRZJ6N0rPtIdJms2o7zlbl+T9DELStwBrqup84N8Brz7MtB3A6iSrkpwMrAe+4anLJM8B3s4gmH1haPy0JI/vtk8Hng8MP0ggSc2aY8+UJGD0pzV/HPidqvq74cHuKctLZ5tQVQeTbASuB5YBW6tqZ5LNwHRVTTG4jXkK8KdJAD5bVRcA3w+8PcljDALkG2Y85SlJLTvmnilJh4wazj4/s8kkeWNVvaaq/vZwk6pqO7B9xtiVQ9svOsy8fwCeOWJtktSaOfVMSYLR33N23ixj589nIZK0iNgzJc3ZEVfOug+G/e/A05PcPnToVODv+yxMkhYae6ak+XC025rXMPjssd8Chr9+6f6quqe3qiRpYbJnSjpuRwtnVVV3Jrl85oEkT7bZSNI3sGdKOm6jrJy9DLgZKCBDxwp4Wk91SdJCZM+UdNyOGM6q6mXdv6tOTDmStHDZMyXNh1E/hPb5SZ7YbV+c5M1JVvRbmiQtTPZMScdj1I/S+N/AA0kOfbr1HcB7eqtKkhY2e6akORs1nB2sqgLWAW+tqi0MHg2XJH0ze6akORv1GwLuT3IFcDFwbpLHMfi+OEnSN7NnSpqzUVfOXg48BFxaVZ8HljP4XkxJ0jezZ0qas5FWzrrm8uah/c8C7+6rKElayOyZko7HqE9r/niSTyX5cpL7ktyf5L6+i5OkhcieKel4jPqeszcBP1ZVn+yzGElaJOyZkuZs1Pec/atNRpJGZs+UNGejrpxNJ/kT4M8ZvMkVgKr6sz6KkqQFzp4pac5GDWdPAh4AXjw0VoCNRpK+mT1T0pyN+rTmz/ddiCQtFvZMScdj1Kc1n5Hkb5P8U7f/rCS/1m9pkrQw2TMlHY9RHwh4B3AF8AhAVd0OrO+rKEla4OyZkuZs1HD2bVV104yxg/NdjCQtEvZMSXM2ajj7YpKnM3hDK0kuAj53tElJ1ibZnWRPkk2zHH9Vkl1Jbu9uAZw1dOyS7kMcP5XkkhHrlKQWzKlnShKM/rTm5cDVwPcl2Q98GvjZI01IsgzYApwH7AN2JJmqql1Dp90KTFbVA0l+kcEHN748yZOB1wGTDJrbzd3ce4/h2iRpXI65Z0rSIUcMZ0leNbS7HbiBwWrbvwE/wdB3x81iDbCnqvZ2r7UNWAd8LZxV1Q1D598IXNxtvwT4YFXd0839ILAW+OOjX5Ikjcdx9kxJAo6+cnZq9+/3As8F/gII8HPAzPdTzHQGcNfQ/j7geUc4/1LgA0eYe8ZRfp8kjdvx9ExJAo4Szqrq9QBJ/g74waq6v9v/deC6+SoiycUMbmH+0DHO2wBsAFixYsV8lSNJc3KsPdMeJmk2oz4Q8BTg4aH9h7uxI9kPnDm0v7wb+wZJXgS8Frigqh46lrlVdXVVTVbV5MTExFEvQpJOkJF6pj1M0mxGfSDg3cBNSd7f7V8I/OFR5uwAVidZxSBYrQd+ZviEJM8B3g6sraovDB26HvjNJKd1+y9m8JlBkrQQzKVnShIw+tc3XZXkA8ALuqGfr6pbjzLnYJKNDILWMmBrVe1MshmYrqop4LeBU4A/TQLw2aq6oKruSfIbDAIewOZDDwdIUuvm0jMl6ZBRV86oqluAW47lxatqO4MnlobHrhzaftER5m4Fth7L75OkVsylZ0oSjP6eM0mSJJ0AhjNJkqSGGM4kSZIaYjiTJElqiOFMkiSpIYYzSZKkhhjOJEmSGmI4kyRJaojhTJIkqSGGM0mSpIYYziRJkhpiOJMkSWqI4UySJKkhhjNJkqSGGM4kSZIaYjiTJElqiOFMkiSpIYYzSZKkhhjOJEmSGmI4kyRJaojhTJIkqSG9hrMka5PsTrInyaZZjp+b5JYkB5NcNOPYo0lu636m+qxTkiSpFSf19cJJlgFbgPOAfcCOJFNVtWvotM8CrwB+dZaXeLCqnt1XfZIkSS3qLZwBa4A9VbUXIMk2YB3wtXBWVXd2xx7rsQ5JkqQFo8/bmmcAdw3t7+vGRvWEJNNJbkxy4bxWJkmS1Kg+V86O11lVtT/J04APJflEVd0xfEKSDcAGgBUrVoyjRkmaM3uYpNn0uXK2HzhzaH95NzaSqtrf/bsX+DDwnFnOubqqJqtqcmJi4viqlaQTzB4maTZ9hrMdwOokq5KcDKwHRnrqMslpSR7fbZ8OPJ+h96pJkiQtVr3d1qyqg0k2AtcDy4CtVbUzyWZguqqmkjwXeD9wGvBjSV5fVecA3w+8vXtQ4HHAG2Y85SlJWmBWbrpu3CVIC0Kv7zmrqu3A9hljVw5t72Bwu3PmvH8AntlnbZIkSS1q+YEAzcK/PCVJWtz8+iZJkqSGGM4kSZIaYjiTJElqiOFMkiSpIYYzSZKkhhjOJEmSGmI4kyRJaojhTJIkqSGGM0mSpIYYziRJkhpiOJMkSWqI4UySJKkhhjNJkqSGGM4kSZIaYjiTJElqiOFMkiSpIYYzSZKkhhjOJEmSGmI4kyRJaojhTJIkqSGGM0mSpIb0Gs6SrE2yO8meJJtmOX5ukluSHExy0YxjlyT5VPdzSZ91SpIktaK3cJZkGbAFOB84G/jpJGfPOO2zwCuAa2bMfTLwOuB5wBrgdUlO66tWSZKkVvS5crYG2FNVe6vqYWAbsG74hKq6s6puBx6bMfclwAer6p6quhf4ILC2x1olSZKa0Gc4OwO4a2h/Xzc2b3OTbEgynWT6wIEDcy5UksbBHiZpNgv6gYCqurqqJqtqcmJiYtzlSNIxsYdJmk2f4Ww/cObQ/vJurO+5kiRJC1af4WwHsDrJqiQnA+uBqRHnXg+8OMlp3YMAL+7GJEmSFrXewllVHQQ2MghVnwTeW1U7k2xOcgFAkucm2Qf8JPD2JDu7ufcAv8Eg4O0ANndjkiRJi9pJfb54VW0Hts8Yu3JoeweDW5azzd0KbO2zPkmSpNYs6AcCJEmSFhvDmSRJUkMMZ5IkSQ0xnEmSJDXEcCZJktQQw5kkSVJDDGeSJEkNMZxJkiQ1pNcPoW3Zyk3XjbsESZKkb7Jkw5kkLVT+cSktbt7WlCRJaojhTJIkqSGGM0mSpIYYziRJkhpiOJMkSWqI4UySJKkhhjNJkqSGGM4kSZIaYjiTJElqiOFMkiSpIYYzSZKkhvQazpKsTbI7yZ4km2Y5/vgkf9Id/3iSld34yiQPJrmt+3lbn3VKkiS1orcvPk+yDNgCnAfsA3YkmaqqXUOnXQrcW1Xfk2Q98Ebg5d2xO6rq2X3VJ0mS1KI+V87WAHuqam9VPQxsA9bNOGcd8K5u+33AjyZJjzVJkiQ1rc9wdgZw19D+vm5s1nOq6iDwZeA7u2Orktya5CNJXtBjnZIkSc3o7bbmcfocsKKq7k7y74E/T3JOVd03fFKSDcAGgBUrVoyhTEmaO3uYpNn0uXK2HzhzaH95NzbrOUlOAr4duLuqHqqquwGq6mbgDuAZM39BVV1dVZNVNTkxMdHDJUhSf+xhkmbTZzjbAaxOsirJycB6YGrGOVPAJd32RcCHqqqSTHQPFJDkacBqYG+PtUqSJDWht9uaVXUwyUbgemAZsLWqdibZDExX1RTwTuA9SfYA9zAIcADnApuTPAI8BlxWVff0VaskSVIren3PWVVtB7bPGLtyaPurwE/OMu9a4No+a5MkSWqR3xAgSZLUEMOZJElSQwxnkiRJDTGcSZIkNcRwJkmS1BDDmSRJUkMMZ5IkSQ0xnEmSJDXEcCZJktQQw5kkSVJDDGeSJEkNMZxJkiQ1xHAmSZLUEMOZJElSQwxnkiRJDTGcSZIkNcRwJkmS1BDDmSRJUkMMZ5IkSQ0xnEmSJDXEcCZJktSQXsNZkrVJdifZk2TTLMcfn+RPuuMfT7Jy6NgV3fjuJC/ps05JkqRW9BbOkiwDtgDnA2cDP53k7BmnXQrcW1XfA/wO8MZu7tnAeuAcYC3w+93rSZIkLWp9rpytAfZU1d6qehjYBqybcc464F3d9vuAH02SbnxbVT1UVZ8G9nSvJ0mStKj1Gc7OAO4a2t/Xjc16TlUdBL4MfOeIcyVJkhadk8ZdwPFIsgHY0O1+JcnuY5h+OvDF+a+qOUvhOpfCNcISuc688Ziu86w+a+nbcfSwJfH/Al7nYrIUrnHe+lef4Ww/cObQ/vJubLZz9iU5Cfh24O4R51JVVwNXz6W4JNNVNTmXuQvJUrjOpXCN4HUuRnPtYUvlv5HXuXgshWuE+bvOPm9r7gBWJ1mV5GQGb/CfmnHOFHBJt30R8KGqqm58ffc05ypgNXBTj7VKkiQ1obeVs6o6mGQjcD2wDNhaVTuTbAamq2oKeCfwniR7gHsYBDi6894L7AIOApdX1aN91SpJktSKXt9zVlXbge0zxq4c2v4q8JOHmXsVcFWP5c3pdugCtBSucylcI3id+rql8t/I61w8lsI1wjxdZwZ3ESVJktQCv75JkiSpIUsunCXZmuQLSf5p3LX0JcmZSW5IsivJziSvHHdNfUjyhCQ3JfnH7jpfP+6a+pJkWZJbk/zluGvpS5I7k3wiyW1JpsddT4uWQv+CpdHDllL/AnvYMb/WUrutmeRc4CvAu6vqB8ZdTx+SPBV4alXdkuRU4GbgwqraNebS5lX3bRJPrKqvJPkW4KPAK6vqxjGXNu+SvAqYBJ5UVS8bdz19SHInMFlVi/6zkOZqKfQvWBo9bCn1L7CHHaslt3JWVX/H4MnQRauqPldVt3Tb9wOfZBF+w0INfKXb/ZbuZ9H9tZFkOfBS4P+MuxaN11LoX7A0ethS6V9gD5uLJRfOlpokK4HnAB8fcym96JbKbwO+AHywqhbjdb4F+B/AY2Ouo28F/HWSm7tPzpcWdQ9bIv0L7GHHzHC2iCU5BbgW+JWqum/c9fShqh6tqmcz+BaJNUkW1a2eJC8DvlBVN4+7lhPgP1XVDwLnA5d3t/C0hC32HrbY+xfYw+b6QoazRap7D8O1wP+tqj8bdz19q6ovATcAa8dcynx7PnBB916GbcCPJPmj8ZbUj6ra3/37BeD9wJrxVqRxWko9bBH3L7CHzYnhbBHq3mj6TuCTVfXmcdfTlyQTSb6j2/5W4Dzgn8da1DyrqiuqanlVrWTwDRofqqqLx1zWvEvyxO6N3yR5IvBiYFE/kajDWwo9bCn0L7CHzfX1llw4S/LHwMeA702yL8ml466pB88Hfo7BXyi3dT//edxF9eCpwA1JbmfwXa4frKpF+5j2IvcU4KNJ/pHB9+heV1V/NeaamrNE+hcsjR5m/1pc5rWHLbmP0pAkSWrZkls5kyRJapnhTJIkqSGGM0mSpIYYziRJkhpiOJMkSWqI4UwLRpJfSfJtQ/vbD31OkCS1zP6lY+FHaagp3YdPpqq+6TvYuk+YnqyqL57wwiTpKOxfmi+unGnskqxMsjvJuxl8ovI7k0wn2Znk9d05vwx8N4MPbbyhG7szyend/E8meUc356+7T9wmyXOT3N59iOVvJ/mnbvycJDd147cnWT2eq5e0kNm/1AfDmVqxGvj9qjoHeHVVTQLPAn4oybOq6veAfwFeWFUvPMz8Ld38LwE/0Y3/AfDfui8XfnTo/MuA3+3GJ4F9839JkpYI+5fmleFMrfhMVd3Ybf9UkluAW4FzgLNHmP/pqrqt274ZWNm9n+PUqvpYN37N0PkfA/5nktcAZ1XVg8d7AZKWLPuX5pXhTK34N4Akq4BfBX60qp4FXAc8YYT5Dw1tPwqcdKSTq+oa4ALgQWB7kh+ZS9GShP1L88xwptY8iUGj+3KSpwDnDx27Hzh11Beqqi8B9yd5Xje0/tCxJE8D9na3G/6CwS0ISToe9i/NiyOmc+lEq6p/THIr8M/AXcDfDx2+GvirJP9ymPdtzOZS4B1JHgM+Any5G/8p4OeSPAJ8HvjNebkASUuW/UvzxY/S0KKW5JSq+kq3vQl4alW9csxlSdJR2b+WLlfOtNi9NMkVDP5f/wzwivGWI0kjs38tUa6cSZIkNcQHAiRJkhpiOJMkSWqI4UySJKkhhjNJkqSGGM4kSZIaYjiTJElqyP8HXagkopQK9XMAAAAASUVORK5CYII=",
                        "text/plain": [
                            "<Figure size 720x360 with 2 Axes>"
                        ]
                    },
                    "metadata": {
                        "needs_background": "light"
                    },
                    "output_type": "display_data"
                }
            ],
            "source": [
                "_, (ax1k, ax2k) = plt.subplots(1, 2, sharey=True, figsize=(10,5))\n",
                "ax1k.hist(Xtr_100k[Xtr_100k !=0], 5, density= True)\n",
                "ax1k.set_title('Train')\n",
                "ax1k.set(xlabel=\"ratings\", ylabel=\"density\")\n",
                "ax2k.hist(Xtst_100k[Xtst_100k !=0], 5, density= True)\n",
                "ax2k.set_title('Test')\n",
                "ax2k.set(xlabel=\"ratings\", ylabel=\"density\")"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "From the plots above we can see that the two datasets have very similar rating distributions. The main difference is in the degree of sparsness of the user/item affinity matrix; this is an important factor as it states the ratio between datapoints and unrated movies to infere. Note that the split function returns the total (or per dataset) sparsness, not the user-wise one. "
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 10,
            "metadata": {},
            "outputs": [],
            "source": [
                "#collection of evaluation metrics for later use\n",
                "def ranking_metrics(\n",
                "    data_size,\n",
                "    data_true,\n",
                "    data_pred,\n",
                "    K\n",
                "):\n",
                "\n",
                "    eval_map = map_at_k(data_true, data_pred, col_user=\"userID\", col_item=\"movieID\", \n",
                "                    col_rating=\"rating\", col_prediction=\"prediction\", \n",
                "                    relevancy_method=\"top_k\", k= K)\n",
                "\n",
                "    eval_ndcg = ndcg_at_k(data_true, data_pred, col_user=\"userID\", col_item=\"movieID\", \n",
                "                      col_rating=\"rating\", col_prediction=\"prediction\", \n",
                "                      relevancy_method=\"top_k\", k= K)\n",
                "\n",
                "    eval_precision = precision_at_k(data_true, data_pred, col_user=\"userID\", col_item=\"movieID\", \n",
                "                               col_rating=\"rating\", col_prediction=\"prediction\", \n",
                "                               relevancy_method=\"top_k\", k= K)\n",
                "\n",
                "    eval_recall = recall_at_k(data_true, data_pred, col_user=\"userID\", col_item=\"movieID\", \n",
                "                          col_rating=\"rating\", col_prediction=\"prediction\", \n",
                "                          relevancy_method=\"top_k\", k= K)\n",
                "\n",
                "    \n",
                "    df_result = pd.DataFrame(\n",
                "        {   \"Dataset\": data_size,\n",
                "            \"K\": K,\n",
                "            \"MAP\": eval_map,\n",
                "            \"nDCG@k\": eval_ndcg,\n",
                "            \"Precision@k\": eval_precision,\n",
                "            \"Recall@k\": eval_recall,\n",
                "        }, \n",
                "        index=[0]\n",
                "    )\n",
                "    \n",
                "    return df_result"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "# 4. Model application, performance and analysis of the results  \n",
                "\n",
                "The model has been implemented as a Tensorflow (TF) class with the TF session hidden inside the `fit()` method, so that no explicit call is needed. The algorithm operates in three different steps: \n",
                "\n",
                "- Model initialization: This is where we tell TF how to build the computational graph. The main parameters to specify are the number of hidden units, the number of training epochs and the minibatch size. \n",
                "\n",
                "- Model fit: This is where we train the model on the data. The method takes two arguments: the training and test set matrices. Note that the model is trained **only** on the training set, the test set is used to display the test set accuracy of the trained model, that in turn is an estimation of the generazation capabilities of the algorithm. It is generally useful to look at these quantities to have a first idea of the optimization behaviour.  \n",
                "\n",
                "- Model prediction: This is where we generate ratings for the unseen items. Once the model has been trained and we are satisfied with its overall accuracy, we sample new ratings from the learned distribution. In particular, we extract the top_k (e.g. 10) most relevant recommendations according to some predefined scorea. The prediction is then returned in a dataframe format ready to be analysed and deployed.  "
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 4.1 1m Dataset"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 11,
            "metadata": {},
            "outputs": [],
            "source": [
                "#First we initialize the model class\n",
                "model_1m = RBM(\n",
                "    possible_ratings=np.setdiff1d(np.unique(Xtr_1m), np.array([0])),\n",
                "    visible_units=Xtr_1m.shape[1],\n",
                "    hidden_units=1200,\n",
                "    training_epoch=30,\n",
                "    minibatch_size=350,\n",
                "    with_metrics=True\n",
                ")"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "Note that the first time the fit method is called it may take longer to return the result. This is due to the fact that TF needs to initialized the GPU session. You will notice that this is not the case when training the algorithm the second or more times. As for the `minibatch_size`, you would like to choose a value that gives you a good generalization error while mantaining a reasonable running time. The lower the size, the closer you get to stochastic gradient descent, but training takes longer. A big size value (say 1/2 of batch size) will speed up training but will increase the generalization error.     "
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 12,
            "metadata": {
                "inputHidden": false,
                "outputHidden": false
            },
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": [
                        "Took 14.37 seconds for training.\n"
                    ]
                },
                {
                    "data": {
                        "image/png": "iVBORw0KGgoAAAANSUhEUgAAAVEAAAE9CAYAAACyQFFjAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAn0UlEQVR4nO3dd3xV9f3H8dcnC0iQECDMAGHIRgQiiKCg/lQcdaC4wYFVHN2ttb9Wu2uH9VdrlUoRUUCtVRSr1lkFBythY3AxAyoBBNnz8/sjB5tqCIR7b84d7+fjcR8k55zc++Y+yJtzzj3n+zV3R0REjkxa2AFERBKZSlREJAIqURGRCKhERUQioBIVEYmASlREJAIZYQeIpiZNmnhhYWHYMUQkyZSUlKx39/yq1iVViRYWFlJcXBx2DBFJMma28mDrdDgvIhIBlaiISARUoiIiEVCJiohEQCUqIhIBlaiISARUoiIiEVCJiohEQCUqIhKBlC7RPfv28/HmHWHHEJEEltIl+vrSddw8eW7YMUQkgaV0iZ7atRnrt+6mZOVnYUcRkQSV0iWanmZcO7CQcW8uCzuKiCSolC5RgOFFrZm5bAOrNmwPO4qIJKCUL9GcOhlcclwbxr+9POwoIpKAYlqiZjbezNaZ2eJqthliZvPNbImZTau0/DvBssVm9piZ1Y1VzqtPKOTpeWvYvH1PrF5CRJJUrPdEJwBDD7bSzBoC9wPnunt3YHiwvBXwTaDI3XsA6cClsQrZPLcup3RpymNzVsXqJUQkScW0RN19OrCxmk0uB6a4+6pg+3WV1mUA9cwsA8gG1sYsKHDdie2Y8PYKdu/dH8uXEZEkE/Y50U5Anpm9YWYlZjYSwN3XAHcBq4CPgc3u/nIsg3RvmUv7/ByeXxTTrhaRJBN2iWYAfYGzgTOA282sk5nlAecB7YCWQI6ZXVnVE5jZ9WZWbGbF5eXlEYW57sR2jHtzOe4e0fOISOoIu0TLgJfcfZu7rwemA72A/wGWu3u5u+8BpgAnVPUE7j7W3YvcvSg/v8rJ+A7bkE5N2blnHzOWbYjoeUQkdYRdolOBQWaWYWbZQH+glIrD+OPNLNvMDDg1WB5TaWnGdSe2Z9ybutxJRA5PrC9xegyYAXQ2szIzG2Vmo81sNIC7lwIvAguB2cA4d1/s7rOAJ4G5wKIg59hYZj3ggt6tWFi2iQ/Xba2NlxORBGfJdP6vqKjIozHv/P+98j7rtuzizmE9o5BKRBKdmZW4e1FV68I+nI9LIwa05fmFa9mwdVfYUUQkzqlEq9Ckfh3O6tmCSTN18b2IVE8lehCjBrVj4syV7NyzL+woIhLHVKIHcXSzo+jRqgHPzFsTdhQRiWMq0Wp8/cT2jHtrOXv36VZQEamaSrQaJ3RoTNtG2Zx339vMX70p7DgiEodUotUwM8ZdVcR1J7bj648U85NnFrF5h4bLE5H/UIkegplxQe8CXv3OYNzhtLun8cy8Nbq/XkQAlehhy83O5NcX9OSBEX0ZO30ZV4ybxUfluqtJJNWpRGuod5s8nr1lIKd2bcZFY97h7pff02VQIilMJXoEMtLTGDWoHf/61km8+/Hn3PrkwrAjiUhIVKIRaJ5blz9d2pu3PlzPMh3ai6QklWiE6tfJYMTxbXlgmuauF0lFKtEouGZgIS+9+wlrN+0IO4qI1DKVaBQ0zM7i4qLWjJ2uvVGRVKMSjZLrBrXj6XlrWK/h80RSiko0Spo2qMs5x7Tgobc1tYhIKlGJRtENJ3Vg8qxVfL5Tt4aKpAqVaBS1aZzNyZ2bMnHGyrCjiEgtUYlG2Y1DOvDQ2yvYsVt3MYmkApVolHVqdhR92zbk8TmaWkQkFahEY+CmIR0ZO30Zu/dqMGeRZKcSjYFerRvSsWl9TS0ikgJUojFy05COjJn2Efv2a9xRkWSmEo2R49s3Ii87kxcWfRx2FBGJIZVojJgZN5/ckfte/1Cj4IskMZVoDJ3SpSkAr7+3LuQkIhIrKtEYOrA3+pd/a29UJFmpRGPsrJ4tcOBXz5eqSEWSkEo0xtLTjAlX9+OdjzZw9yvvhx1HRKJMJVoLcrMzmTiqHy8s+pj73/gw7DgiEkUxLVEzG29m68xscTXbDDGz+Wa2xMymVVre0MyeNLOlZlZqZgNimTXWmtSvw+Trjufx2as1XJ5IEon1nugEYOjBVppZQ+B+4Fx37w4Mr7T6HuBFd+8C9AJKYxezdjTPrcvk6/oz7s3l/F331oskhZiWqLtPBzZWs8nlwBR3XxVsvw7AzHKBk4AHg+W73X1TLLPWltaNspk4qh93v/I+U+frtlCRRBf2OdFOQJ6ZvWFmJWY2MljeDigHHjKzeWY2zsxywosZXe3z6/PItf355XOlvLj4k7DjiEgEwi7RDKAvcDZwBnC7mXUKlvcBxrh7b2AbcFtVT2Bm15tZsZkVl5eX11LsyHVufhQPXX0cP356EW/oYnyRhBV2iZYBL7n7NndfD0yn4vxnGVDm7rOC7Z6kolS/wt3HunuRuxfl5+fXSuho6VmQy9iRffnuEwsoWVndWQ8RiVdhl+hUYJCZZZhZNtAfKHX3T4DVZtY52O5U4N2wQsZS37aN+PFZXbnrJV1DKpKIYn2J02PADKCzmZWZ2SgzG21mowHcvRR4EVgIzAbGufuBy6G+AUw2s4XAscBvYpk1TF/r1ZIPy7fywadbwo4iIjVkyXQrYlFRkRcXF4cd44jc/cr7fLZtN788v0fYUUTkS8ysxN2LqloX9uG8BK7o34ZnF6zVdMsiCUYlGieaNajLoI5NmFJSFnYUEakBlWgcGTmgLY/MXKnRnkQSiEo0jvRr14is9DTe/nBD2FFE5DCpROOImTFyQCEPz1gRdhQROUwq0Thzfu+WzFmxkdUbt4cdRUQOg0o0zmRnZTCsdwGTZ2mUJ5FEoBKNQyMGtOWJ4tXs3LMv7Cgicggq0TjUrkkOPVvl8txCzVkvEu9UonHqqhPa8vA7K3S5k0icU4nGqcGdmrJ5xx7mr94UdhQRqYZKNE6lpxkjjm/LIzNWhh1FRKqhEo1jw4sKeK30U9Zv3RV2FBE5CJVoHGuYncWZPVrw9zmrw44iIgehEo1zIwa0ZdLMlezdtz/sKCJSBZVonOvRKpdWDevxaumnYUcRkSqoRBPAyBMKGf+2LncSiUcq0QQwtHtzdu3Zx2//tVRFKhJnVKIJICsjjQnX9OON98q57/UPw44jIpWoRBNEXk4WE0f148mSMh56e3nYcUQkoBJNIE0b1GXSdf352/RlPFGsy55E4oFKNMEU5GUz8br+3PXSe7ywSAOUiIRNJZqAOuTXZ8I1/bhj6mJef29d2HFEUppKNEF1a9mAB0YU8f0nFjBrmeZkEgmLSjSB9W2bx58v681Nk+eysGxT2HFEUpJKNMEN7NiE3154DNdOKObDdVvDjiOSclSiSeC0bs245eQO/PjpRboYX6SWqUSTxIgBhXy+c6+mFBGpZSrRJJGeZvz83O7c+UIp23fvDTuOSMpQiSaRfu0aUVTYiDFvfBR2FJGUoRJNMj86qwsTZ65k1YbtYUcRSQkxLVEzG29m68xscTXbDDGz+Wa2xMymfWldupnNM7PnYpkzmbTIrcfXT2zPr55/N+woIikh1nuiE4ChB1tpZg2B+4Fz3b07MPxLm3wLKI1VuGQ1alA73vt0C9PfLw87ikjSi2mJuvt0YGM1m1wOTHH3VcH2X9zDaGYFwNnAuFhmTEZ1M9O5/exu/PyfS9ijaUVEYirsc6KdgDwze8PMSsxsZKV1fwJuBdQCR+DUrk0pyMvm4XdWhB1FJKmFXaIZQF8q9jjPAG43s05mdg6wzt1LDvUEZna9mRWbWXF5uQ5fDzAz7vhaN+5/4yPKt2jKZZFYCbtEy4CX3H2bu68HpgO9gIHAuWa2AngcOMXMJlX1BO4+1t2L3L0oPz+/tnInhA759bmobwG/f3Fp2FFEklbYJToVGGRmGWaWDfQHSt39R+5e4O6FwKXAv939yjCDJqpvnNKRae+XM3/1prCjiCSlWF/i9BgwA+hsZmVmNsrMRpvZaAB3LwVeBBYCs4Fx7n7Qy6Gk5o6qm8mtQ7vw02eXsH+/7qsXiTZLpgErioqKvLi4OOwYcWf/fmfYmHe4on8bhhe1DjuOSMIxsxJ3L6pqXdiH81IL0oL76n/34lI+KtdweSLRpBJNEb1aN+TWoV24+qHZrNuyM+w4IklDJZpCLi5qzfC+rbnmoTls3aWRnkSiQSWaYr5xSkeOKWjIjZNKdDeTSBSoRFOMmfHL87pTJyONHz61UCPhi0RIJZqCMtLTuPeyPiwr38ZdL78XdhyRhJZxuBuaWR3gQqCw8s+5+y+iH0tirV5WOg9eVcRFf51B89x6jDi+bdiRRBJSTfZEpwLnAXuBbZUekqAa16/Dw9f0497XPuClJZ+EHUckIR32nihQ4O4HHRtUElObxtk8eNVxXPXQbJrUz6Jv20ZhRxJJKDXZE33HzHrGLImEpmdBLndf3IsbJs7l2QVr2blnX9iRRBLGYd/2aWbvAh2B5cAuwAB392NiF69mdNtnZKa9X864N5exsGwzZ/ZozrA+BRS1zSMtzcKOJhKq6m77rMnh/JlRyiNxanCnfAZ3yueTzTt5Zv4afvLMInbs2ccFvQsY1rsVhU1ywo4oEncOuSdqZg3c/XMzq/JkmbtXN/1HrdKeaHS5O0vWfs6UuWt4dsEa2jbO4ZKi1pzfuxVZGbo6TlJHdXuih1Oiz7n7OWa2HHAqDuMPcHdvH72okVGJxs6efft584NyHnp7BcvKt3HLKR25sE+BylRSQkQlmkhUorWjZOVG/vTqBypTSRlRK1EzywOOBuoeWBbM6BkXVKK1S2UqqSIqJWpm11ExD3wBMB84Hpjh7qdEKWfEVKLhqFymN5/ckUuPa61P9CWpRGtQ5m8BxwEr3f1koDewKfJ4kuj6tm3ExFH9+fNlxzJx5komzlwZdiSRWlOTEt3p7juh4j56d18KdI5NLElEfds24u6Le/Hn1z5g8/Y9YccRqRU1KdEyM2sIPAO8YmZTAe1yyH/p2qIBp3dvzj2vfRB2FJFacdgl6u4XuPsmd/8ZcDvwIHB+jHJJAvve6Z14el6Z5nOSlHBYJWpm6Wa29MD37j7N3Z91992xiyaJqkn9Oowe3IHfPF8adhSRmDusEnX3fcB7ZtYmxnkkSVw9sJAPy7cy/f3ysKOIxFRNzonmAUvM7DUze/bAI1bBJLHVyUjnf8/qyq+ef5e9mstJklhNBiC5PWYpJCmd3q0ZE95ewWNzVmvkfElaNdkTPSs4F/rFAzgrVsEk8ZkZt5/TjXtefZ/NO3TJkySnmpToaVUs0/B4Uq1uLRtwWrdm3KtLniRJHbJEzexGM1sEdDazhZUey4GFsY8oie67p3XmqbllLNMlT5KEDmdP9FHga8CzwZ8HHn3d/coDGwWDk4h8Rf5RdbhhcAd+88LSQ28skmAOWaLuvtndV7j7Ze6+stLjy4MxvxajjJIErhlYyPufbuGtD9aHHUUkqqI5bpmG7ZGDOnDJ0y+f0yVPklyiWaJfGVPPzMab2TozW3ywHzKzIWY238yWmNm0YFlrM3vdzN4Nln8rijklJGd0b0ZeTiaPzVkddhSRqIn1CLoTgIPOVR8MaHI/cK67dweGB6v2At9z925UjFt6s5l1i21UiTUz445zunPPq+/z2TbdMSzJIaaH88Go99VNZHc5MMXdVwXbrwv+/Njd5wZfbwFKgVZRzCoh6dayAWf3bMEfXn4v7CgiUVGjEjWzQWZ2TfB1vpm1q7T61CN4/U5Anpm9YWYlZjayitcspGIA6FlH8PwSh757emdeefdTFpZtCjuKSMQOu0TN7KfAD4EfBYsygUkH1h/h1MkZQF/gbOAM4HYz61TpNesDTwHfdvfPD5LrejMrNrPi8nINdpEIcutlcusZnbl96hL270+eiRIlNdVkT/QC4FxgG4C7rwWOivD1y4CX3H2bu68HpgO9AMwsk4oCnezuUw72BO4+1t2L3L0oPz8/wjhSWy7sU0C6wRPF+pBJEltNSnS3V8xq5wBmlhOF158KDDKzDDPLBvoDpWZmVAz6XOrud0fhdSTOpKUZvzivB3e9/B6btutDJklcNSnRJ8zsAaChmX0deBX4W3U/YGaPATOouGW0zMxGmdloMxsN4O6lwItU3D46Gxjn7ouBgcAI4JTg8qf5ZqbBTpJMj1a5nNWzBX94SR8ySeKq6bzzpwGnU/FJ/Evu/kqsgh0JTZmceDZv38Opd0/joauPo2dBbthxRKoUlSmTg8P3f7v7D6jYA60XnLcUOWK52ZncOrQzt09drA+ZJCHV5HB+OlDHzFpRcQg+goqL6UUiclGfAszgHyX6kEkST01K1Nx9OzAMGOPuw4HusYklqSQtzfjleT34w0v6kEkST41K1MwGAFcAzwfL0qMfSVJRj1a5nNmjBXfpTiZJMDUp0W9TcaH90+6+xMzaA6/HJJWkpO+f3pkXF3/KorLNYUcROWyHXaLBvErnuvvvgu+Xufs3YxdNUk1u9oE7mRZruDxJGDX5dL7IzKaY2dzK04TEMpyknov6FtAoJ4urHprNRo30JAmgJofzk6n4NP5C/nuaEJGoSUsz/jayiJ6tGnLuX95i8Rod2kt8q8m88+Xu/mzMkogE0tOM287sQs9WuYwcP5ufnN2VYX0Kwo4lUqWalOhPzWwcFXMp7TqwsLrBQUQicfYxLejYtD43TCxmYdlmfnx2VzLTYz2OuEjN1ORf5DXAsVSMVH/gUP6cGGQS+ULn5kcx9ZZBrNq4nSvGzaJ8y65D/5BILarJnuhx7t45ZklEDiK3XibjRhbxp9c+4Ny/vMX9V/ShdxvN0C3xoSZ7ou9oniMJS1qa8d3TOvGL83pw3cPFPDNvTdiRRIDD3BMNxvccDFxhZsupOCdqgLv7MTHMJ/JfTuvWjMLGx3P5uFkcVTeDU7s2CzuSpLjDKlF3dzNrChwd4zwih3R0s6P428giRk2Yw9iRfenbtlHYkSSF1eRw/imgqbuvrPyIVTCR6hzbuiF3X3IsN0ycywefbgk7jqSwmpRof2CGmX0U3K20SHcsSZgGd8rnJ2d35arxs1m7aUfYcSRF1eTT+TNilkLkCJ3fuxXrt+5i5PjZ/OOGAeTlZIUdSVLMYZeoDt0lXl13YnvKt+7i2ofnMPm6/mRn1WTfQCQyuv1DksJtQ7vQvkl9bnl0Hns0ApTUIpWoJAUz47cX9sTdue2pRdRkAkaRSKhEJWlkpqdx3xV9WLZ+K795oZRde/eFHUlSgEpUkkp2VgbjrzqOpZ9sod+vX+P7/1jAtPfLdYgvMVOjeefjneadl8o+2byT5xd9zHML17Jyw3bO6N6crx3Tgv7tG5OeZmHHkwRS3bzzKlFJCas3bv+iUD/9fBdn9WjOJce1oVvLBmFHkwSgEhWpZPn6bfxzwVomzVxJnzZ5fOt/jqZrC5WpHFx1JapzopJy2jXJ4ZunHs20H5xMUWEeI8fP5sZJJSz95POwo0kCUolKyqqXlc51J7Zn+g9Opk+bPK4cN5ubJ8/lvU90L74cPpWopLx6Wel8/aT2TL91CMcU5HLFuJnc/KgGNpHDoxIVCWRnZXDD4A5M+8HJ9GyVyyVjZzJ/9aawY0mcU4mKfElOnQxGD+7Ab4f15MZJJazbsjPsSBLHYlqiZjbezNaZ2eJqthliZvPNbImZTau0fKiZvWdmH5rZbbHMKVKV07s355LjWnPTpLns3quL9aVqsd4TnUDF7KBVMrOGwP3Aue7eHRgeLE8H7gPOBLoBl2l+JwnDN085mrycLH7+zyVhR5E4FdMSdffpwMZqNrkcmOLuq4Lt1wXL+wEfuvsyd98NPA6cF8usIlVJSzPuvrgXs5Zv5NFZq8KOI3Eo7HOinYA8M3vDzErMbGSwvBWwutJ2ZcGyrzCz682s2MyKy8vLYxxXUtFRdTMZO6Ivf3z5PUpWVrdPIKko7BLNAPoCZ1Mxcv7tZtapJk/g7mPdvcjdi/Lz82ORUYT2+fW5a3gvbpo8l08264Mm+Y+wS7QMeMndt7n7emA60AtYA7SutF1BsEwkNCd3acrIAYXcMKmEnXs0zJ5UCLtEpwKDzCzDzLKpmAyvFJgDHG1m7cwsC7gUeDbEnCIA3DSkA60a1uWOqYs18LMAsb/E6TFgBtDZzMrMbJSZjTaz0QDuXgq8CCwEZgPj3H2xu+8FbgFeoqJUn3B3fTwqoTMz/nBRLxaWbWbiTE07JhrFSeSIrNqwnWFj3uZHZ3ZlWJ9WmGl80mSmUZxEoqxN42weurofE95ZwbAx7zBv1WdhR5KQqERFjlDPglym3jyQK/q3ZfSkEr79+DzWbtoRdiypZSpRkQikpRkX9S3g398bQutG2Zz15zf5v1feZ/vuvWFHk1qiEhWJgpw6GXzv9M48941BLFu/jVP/OI2n55Wxf3/yfOYgVVOJikRRQV42917Wm3sv682Et1dw8QMzNApUklOJisRAUWEjnr5pICd1yueC+96h9GNNPZKsVKIiMZKWZnzz1KP54ZlduHLcLF5fuu7QPyQJRyUqEmPn9mrJ2JFF/PCphUx4e3nYcSTKVKIitaBv2zyeuvEEJs9axR1TF7N3nwZ5ThYqUZFa0rpRNk/ddALL129j1MPFbNm5J+xIEgUqUZFa1KBuJg9dfRytG9XjojEzKPtse9iRJEIqUZFalpGexi/P68Elx7Vm2P3v8PS8Ms3hlMBUoiIhMDOuHdSOey7tzZMlZZz0+9e57/UP2bR9d9jRpIYywg4gksoGdGjMgA6NeXft54x/ezmD//AG5xzTgmsHtaNDfv2w48lhUImKxIFuLRtw1/BerNuyk0kzV3HJAzPo2SqXUYPaM7BjYw21F8c0nqhIHNq5Zx9T56/hwbeWk2bGVScUcv6xraiXlR52tJRU3XiiKlGROObuvP3hBia8s4K5qz5jeFEBI45vS0FedtjRUkp1JarDeZE4ZmYMOroJg45uwqoN23lkxgrOufct+rdrxNUntOP49o10qB8y7YmKJJhtu/YyZd4aHn5nBRlpFYf65/ZqSU4d7RPFig7nRZJQ5UP92cs3cHr35lzUt4B+hY1IS9PeaTTpcF4kCVU+1F+3ZSdT563ljqmL2blnPxf2KWBYn1a0bqRzp7GmPVGRJOLuLF7zOU+WrObZBWvp0rwBw4sKGNqjOdlZ2mc6UjqcF0lBu/bu47XSdfx9zmo+/Xwnz9w8kLqZukTqSGjKZJEUVCcjnbN6tmDCNcfRqdlR3DF1cdiRkpJKVCTJmRl3DutJycrP+Efx6rDjJB2VqEgKyKmTwZgr+3Lnv5ay9BPN9xRNKlGRFNGp2VH85Oyu3DRprgaEjiKVqEgKGdangP7tG3HblEUk04fKYVKJiqSYn36tO8vLtzFx5sqwoyQFlahIiqmbmc6YK/twz6sfsGD1prDjJDyVqEgKats4h19f0IObH52r0fQjFNMSNbPxZrbOzKq8QM3MhpjZZjObHzzuqLTuO2a2xMwWm9ljZlY3lllFUs3QHi04vVtzvvfEAvbv1/nRIxXrPdEJwNBDbPOmux8bPH4BYGatgG8CRe7eA0gHLo1pUpEUdNuZXdi4fTcPTF8WdpSEFdMSdffpwMYj/PEMoJ6ZZQDZwNqoBRMRALIy0rjv8j48+NZy5qw40l/V1BYP50QHmNkCM/uXmXUHcPc1wF3AKuBjYLO7v1zVD5vZ9WZWbGbF5eXltZdaJEm0bFiP31/Uk28/Pl/nR49A2CU6F2jr7r2Ae4FnAMwsDzgPaAe0BHLM7MqqnsDdx7p7kbsX5efn105qkSRzSpdmDO3RnFufXKjrR2so1BJ198/dfWvw9QtAppk1Af4HWO7u5e6+B5gCnBBiVJGkd+vQzny8eSePzND1ozURaomaWXMLJogxs35Bng1UHMYfb2bZwfpTgdLwkookvzoZ6dx7WW/uee0DlqzdHHachBHrS5weA2YAnc2szMxGmdloMxsdbHIRsNjMFgB/Bi71CrOAJ6k43F8U5Bwby6wiAoVNcvjp17rxjUfnsW3X3rDjJAQNyiwiX3HrkwvYtx/+eHGvsKPEBQ3KLCI18rNzuzN/9WdMmVsWdpS4pxIVka/IzsrgL5f34VfPl7KsfGvYceKaSlREqtS1RQO+c1onbnl0Hrv27gs7TtxSiYrIQV3Zvw1tGmVz5wtLw44StzSHqogclJnxuwuP4aw/v0mjnCyK2ubRPr8+zRrUIbg6MeWpREWkWrnZmfxtZBETZ67gT6+tZ1n5Vnbu2U/7/BzaN8mhQ3592ufXp12THJrUzyI3O5M6GakzNbNKVEQOqVvLBtw57Jgvvt+8fQ8frd/KR+u28lH5NqbOX8OKDdvYuG0Pm3fsJiMtjYbZmeTWy6RhdiYN62XRMDuTc45pyaCjm4T4N4k+laiI1FhudiZ92uTRp03eV9a5O9t372PTjj1s2r6bzdv3sGnHHsq37OLbf5/P/57VhWF9CkJIHRsqURGJKjMjp04GOXUyaNWw3n+tG9ixMVeNn8OGrbv5+kntQ0oYXfp0XkRqTcemR/GP0QN4ong1d75QmhQjRqlERaRWtWxYjyduGMDsFRv5wZML2btvf9iRIqISFZFal5eTxeTr+rN+6y5umFjCjt2JezG/SlREQpGdlcHfRhaRWy+TEQ/OYvP2PWFHOiIqUREJTWZ6GncN78WxrRsy/IF3+GTzzrAj1Zg+nReRUKWlGT8+uytNptfhzHum071lLm0aZ9O2UTZtG2fTulE2bRvnUL9OfNZVfKYSkZRiZowe3IGze7bgo/KtrNq4nZUbtjNnxWes2riNVRu3k5OVQetG2Vw7qB3n9moZduQvqERFJG60blSx5/ll7k75ll2UfrKFHz21kDWf7WD04PZxcf++zomKSNwzM5o2qMvgTvk8ddMJTJ2/hjumLmHf/vCvM1WJikhCaZFbjydGD+Cj8q2MnhT+5VEqURFJOA3qZjLhmn7kZKVz+biZbNy2O7QsKlERSUhZGWncffGxHN++MReOeYeVG7aFkkMlKiIJKy3N+OHQLlw7sJDhf53BgtWbaj9Drb+iiEiUjRhQyK/O78E1E+bw76Wf1uprq0RFJCmc3r05464q4gf/WEjJyo219roqURFJGn3a5HHX8F7cOGkuZZ9tr5XXVImKSFI5uUtTrj+pPV9/pIRtu/bG/PVUoiKSdEYNakePlg347hPz2R/jC/JVoiKSdMyMX13Qg43bdnP3K+/H9LVUoiKSlOpkpDPmyr48M38NU+evidnrqERFJGk1qV+HcVcV8fN/vsv8GF1DGtMSNbPxZrbOzBYfZP0QM9tsZvODxx2V1jU0syfNbKmZlZrZgFhmFZHk1KV5A3534TGMnljCx5t3RP35Y70nOgEYeoht3nT3Y4PHLyotvwd40d27AL2A0hhlFJEkd1q3Zlx1QiHXPxL9AUtiWqLuPh2o8VWvZpYLnAQ8GDzPbnffFN10IpJKRg9uT8em9fn+kwuiOlVzPJwTHWBmC8zsX2bWPVjWDigHHjKzeWY2zsxyQswoIgnOzLhzWE8+3rSDGcs2RO15wy7RuUBbd+8F3As8EyzPAPoAY9y9N7ANuK2qJzCz682s2MyKy8vLayGyiCSqupnpPH79AE7o0CRqzxlqibr75+6+Nfj6BSDTzJoAZUCZu88KNn2SilKt6jnGunuRuxfl5+fXSm4RSVxZGdGtvVBL1MyaWzBJipn1C/JscPdPgNVm1jnY9FTg3ZBiiogcVEwnqjOzx4AhQBMzKwN+CmQCuPtfgYuAG81sL7ADuNT/c8b3G8BkM8sClgHXxDKriMiRiGmJuvtlh1j/F+AvB1k3HyiKQSwRkagJ+4MlEZGEphIVEYmASlREJAIqURGRCKhERUQioBIVEYmASlREJAIWzdFMwmZm5cDKGv5YE2B9DOJEW6LkhMTJqpzRlyhZa5qzrbtXeV95UpXokTCzYneP+4v6EyUnJE5W5Yy+RMkazZw6nBcRiYBKVEQkAipRGBt2gMOUKDkhcbIqZ/QlStao5Uz5c6IiIpHQnqiISARSukTNbKiZvWdmH5pZldOPxAMzW2Fmi4JppYvDzlNZVdNim1kjM3vFzD4I/swLM2OQqaqcPzOzNZWm7D4rzIxBptZm9rqZvWtmS8zsW8HyuHpPq8kZV++pmdU1s9nBPG5LzOznwfJ2ZjYr+N3/ezBu8ZG9RqoezptZOvA+cBoV05HMAS5z97gbQd/MVgBF7h5319+Z2UnAVuARd+8RLPs9sNHdfxv855Tn7j+Mw5w/A7a6+11hZqvMzFoALdx9rpkdBZQA5wNXE0fvaTU5LyaO3tNg5owcd99qZpnAW8C3gO8CU9z9cTP7K7DA3cccyWuk8p5oP+BDd1/m7ruBx4HzQs6UcA4yLfZ5wMPB1w9T8csVqiOdvru2ufvH7j43+HoLUAq0Is7e02pyxhWvsDX4NjN4OHAKFXO3QYTvZyqXaCtgdaXvy4jDfwQBB142sxIzuz7sMIehmbt/HHz9CdAszDCHcIuZLQwO90M/7VCZmRUCvYFZxPF7+qWcEGfvqZmlm9l8YB3wCvARsMnd9wabRPS7n8olmkgGuXsf4Ezg5uDQNCEEc2bF6zmjMUAH4FjgY+CPoaapxMzqA08B33b3zyuvi6f3tIqccfeeuvs+dz8WKKDiCLRLNJ8/lUt0DdC60vcFwbK44+5rgj/XAU9T8Q8hnn0anDM7cO5sXch5quTunwa/YPuBvxEn72tw7u4pYLK7TwkWx917WlXOeH1PAdx9E/A6MABoaGYH5piL6Hc/lUt0DnB08CldFnAp8GzImb7CzHKCE/eYWQ5wOrC4+p8K3bPAVcHXVwFTQ8xyUAdKKXABcfC+Bh+EPAiUuvvdlVbF1Xt6sJzx9p6aWb6ZNQy+rkfFB8mlVJTpRcFmEb2fKfvpPEBw+cWfgHRgvLv/OtxEX2Vm7anY+4SK2VkfjaeclafFBj6lYlrsZ4AngDZUjKp1sbuH+qHOQXIOoeKw04EVwA2VzjuGwswGAW8Ci4D9weL/peJ8Y9y8p9XkvIw4ek/N7BgqPjhKp2Kn8Ql3/0Xwe/U40AiYB1zp7ruO6DVSuURFRCKVyofzIiIRU4mKiERAJSoiEgGVqIhIBFSiIiIRUImKHISZDTGz58LOIfFNJSoiEgGVqCQ8M7syGDNyvpk9EAw4sdXM/i8YQ/I1M8sPtj3WzGYGA2Q8fWCADDPraGavBuNOzjWzDsHT1zezJ81sqZlNDu7UEfmCSlQSmpl1BS4BBgaDTOwDrgBygGJ37w5Mo+IOJYBHgB+6+zFU3G1zYPlk4D537wWcQMXgGVAxOtG3gW5Ae2BgjP9KkmAyDr2JSFw7FegLzAl2EutRMTjHfuDvwTaTgClmlgs0dPdpwfKHgX8EYxO0cvenAdx9J0DwfLPdvSz4fj5QSMXAviKASlQSnwEPu/uP/muh2e1f2u5I72+ufD/1PvQ7I1+iw3lJdK8BF5lZU/hiLqK2VPzbPjBKz+XAW+6+GfjMzE4Mlo8ApgUjs5eZ2fnBc9Qxs+za/EtI4tL/qpLQ3P1dM/sJFSP/pwF7gJuBbUC/YN06Ks6bQsWwZ38NSnIZcE2wfATwgJn9IniO4bX415AEplGcJCmZ2VZ3rx92Dkl+OpwXEYmA9kRFRCKgPVERkQioREVEIqASFRGJgEpURCQCKlERkQioREVEIvD/ma51n8fEUX4AAAAASUVORK5CYII=",
                        "text/plain": [
                            "<Figure size 360x360 with 1 Axes>"
                        ]
                    },
                    "metadata": {
                        "needs_background": "light"
                    },
                    "output_type": "display_data"
                }
            ],
            "source": [
                "#Model Fit\n",
                "with Timer() as train_time:\n",
                "    model_1m.fit(Xtr_1m)\n",
                "\n",
                "print(\"Took {:.2f} seconds for training.\".format(train_time.interval))\n",
                "\n",
                "# Plot the train RMSE as a function of the epochs\n",
                "line_graph(values=model_1m.rmse_train, labels='train', x_name='epoch', y_name='rmse_train')"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "During training, we evauate the root mean squared error to have an idea of how learning is proceeding. Remember that in the RBM this is not the quantity being minimized, but plotting the rmse per epoch gives us a rough understanding of how learning is proceeding and how we should adjust the hyper parameters. Generally, we would like to see the rmse decrease monotonically as a function of the learning epochs. Even though you may be using an automated hyper parameter optimization method, I strongly suggest to spend some time to manually inspect the learning process; this will give you an idea of the value range to expect for the hyperparameters. Finally, note that most automated hyperparameters search methods are optimized for supervised learning, so they may not work as well for unsupervised tasks. \n",
                "\n",
                "The two final scores are the train/test mean average accuracies across the all set together with their difference. This has been defined as: \n",
                "\n",
                "$$ AC = \\frac{1}{m} \\sum_{\\mu=1}^{m} \\sum_{i=1}^{N_v} \\frac{1}{s_i} \\, I(v=vp)_{\\mu,i}, $$\n",
                "\n",
                "where $m$ = total number of users, $N_v$ = Total number of items $\\equiv$ number of visible units and $s_i$= the number of non-zero elements per row, i.e. the per user total number of ratings. \n",
                "Remember that for a model to generalize well, the difference between train and test metrics should not be too big. In order to visualize these online metrics, choose `with_metrics =True` in the `RBM()` model function. When evaluating metrics, the model takes a bit longer to run, but you need to do so only in the exploratory phase of your work"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "### 4.1.2 Model Evaluation\n",
                "\n",
                "To evaluate the model performance and compare it against the other algorithms in this repository, we use the `recommend_k_items()` method. Note that we pass 'maps' as a second argument in order to return the correct user/item IDs in a pandas dataframe format.  "
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 13,
            "metadata": {},
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": [
                        "Took 2.39 seconds for prediction.\n"
                    ]
                }
            ],
            "source": [
                "#number of top score elements to be recommended  \n",
                "K = 10\n",
                "\n",
                "#Model prediction on the test set Xtst.\n",
                "with Timer() as prediction_time:\n",
                "    top_k_1m =  model_1m.recommend_k_items(Xtst_1m)\n",
                "\n",
                "print(\"Took {:.2f} seconds for prediction.\".format(prediction_time.interval))"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "top_k returns the first K elements having the highest recommendation score. Here the recommendation score is evaluated by multiplying the predicted rating by its probability, i.e. the confidence the algorithm has about its output. So if we have two items both with predicted ratings 5, but one with probability 0.5 and the other 0.9, the latter will be considered more relevant. In order to inspect the prediction and use the evaluation metrics in this repository, we convert both top_k and Xtst to pandas dataframe format:"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 14,
            "metadata": {},
            "outputs": [],
            "source": [
                "top_k_df_1m = am1m.map_back_sparse(top_k_1m, kind = 'prediction')\n",
                "test_df_1m = am1m.map_back_sparse(Xtst_1m, kind = 'ratings')"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 15,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>Dataset</th>\n",
                            "      <th>K</th>\n",
                            "      <th>MAP</th>\n",
                            "      <th>nDCG@k</th>\n",
                            "      <th>Precision@k</th>\n",
                            "      <th>Recall@k</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>mv 1m</td>\n",
                            "      <td>10</td>\n",
                            "      <td>0.27086</td>\n",
                            "      <td>0.677539</td>\n",
                            "      <td>0.572185</td>\n",
                            "      <td>0.309783</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "  Dataset   K      MAP    nDCG@k  Precision@k  Recall@k\n",
                            "0   mv 1m  10  0.27086  0.677539     0.572185  0.309783"
                        ]
                    },
                    "execution_count": 15,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "rating_1m= ranking_metrics(\n",
                "    data_size = \"mv 1m\",\n",
                "    data_true =test_df_1m,\n",
                "    data_pred =top_k_df_1m,\n",
                "    K =10)\n",
                "\n",
                "rating_1m"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "Formally, one should train the model until the cost function becomes flat but often an \"early stopping\" does the job. In the above example, we decided to train the algorithm to achieve higher ranking metrics. A faster optimization will do as well, but it will decrease the ranking metrics. "
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 4.2 100k Dataset"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 16,
            "metadata": {},
            "outputs": [],
            "source": [
                "#100k\n",
                "model_100k = RBM(\n",
                "    possible_ratings=np.setdiff1d(np.unique(Xtr_100k), np.array([0])),\n",
                "    visible_units=Xtr_100k.shape[1],\n",
                "    hidden_units=600,\n",
                "    training_epoch=30,\n",
                "    minibatch_size=60,\n",
                "    keep_prob=0.9,\n",
                "    with_metrics=True\n",
                ")"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 17,
            "metadata": {},
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": [
                        "Took 2.21 seconds for training.\n"
                    ]
                },
                {
                    "data": {
                        "image/png": "iVBORw0KGgoAAAANSUhEUgAAAVEAAAE9CAYAAACyQFFjAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAuBUlEQVR4nO3deXiU9bn/8fedhUASCJCFJewJgiyyiKhAxbqCVVFbrVR7rLVaW9va01XPaavVX3tOq+05PVartlLUKta6191aRUFcwqLssu+QQNiXQJL798cMNECWmWQmTybzeV1XLpJnmbmZi3x4vs9yf83dERGRxkkJugARkUSmEBURaQKFqIhIEyhERUSaQCEqItIEClERkSZIC7qAWMrLy/M+ffoEXYaItDKzZ8/e6u75ta1rVSHap08fSkpKgi5DRFoZM1tT1zoN50VEmkAhKiLSBApREZEmUIiKiDRBXEPUzKaYWamZLahjfY6Z/d3MPjazhWZ2bY1115jZsvDXNfGsU0SkseJ9JDoVmFDP+puARe4+DDgT+I2ZtTGzzsBtwKnAaOA2M+sU51pFRKIW1xB193eA8vo2AdqbmQHZ4W0rgfOBN9y93N23A29QfxiLiAQi6PtEfw+8AGwE2gNfdPdqMysE1tXYbj1QGEB9IiL1CvrC0vnAPKA7MBz4vZl1iOYFzOwGMysxs5KysrLYVygiUo+gQ/Ra4BkPWQ6sAgYCG4CeNbbrEV52HHd/0N1Hufuo/Pxan8qqU0VlFZt27m9c5SIiBB+ia4GzAcysCzAAWAm8BpxnZp3CF5TOCy+LqbeWlPLjp+fH+mVFJInE9ZyomU0jdNU9z8zWE7ring7g7vcDdwJTzWw+YMCP3X1reN87gY/CL3WHu9d3gapRTu+Xx/ef/JiKyioy0lJj/fIikgTiGqLuPrmB9RsJHWXWtm4KMCUedR2Wk5lOcUE2c9bs4PSi3Hi+lYi0UkEP5wM3tjiPmcu3Bl2GiCSopA/RccV5zFCIikgjJX2IjuzdiWVbdrNz/6GgSxGRBJT0Ido2PZWRvTvx/sptQZciIgko6UMUdF5URBpPIYrOi4pI4ylEgUHdOrB970E9vSQiUVOIAikpxpiiPGYu13lREYmOQjRM50VFpDEUomGHz4u6e9CliEgCUYiG9crNpG16CstK9wRdiogkEIVoDeOK85ixTEN6EYmcQrQGnRcVkWgpRGs4vV8uH64q51BVddCliEiCUIjWkJudQc/OmXy8bkfQpYhIglCIHmNcfz29JCKRU4geQ+dFRSQaCtFjnNKnEws37mJPRWXQpYhIAlCIHiOzTRon9cjhw1V6BFREGqYQrUXoflGFqIg0TCFaC50XFZFIKURrMbQwh00791O6+0DQpYhIC6cQrUVaagqn9ctl1goN6UWkfgrROozrr+foRaRhCtE6HD4vqtZ4IlIfhWgd+uVl4cCqrXuDLkVEWjCFaB3MTFfpRaRBCtF6aBZQEWmIQrQeY4pDV+irqnVeVERqpxCtR0H7thR2ymT2mu1BlyIiLZRCtAETBnfllQWbgi5DRFoohWgDLhjalVcXbKZaQ3oRqYVCtAH9u7Qns00q89bvCLoUEWmBFKIRuGBoN15dsDnoMkSkBYpriJrZFDMrNbMFdaz/oZnNC38tMLMqM+scXvfvZrYwvHyambWNZ631mTCkKy/P36Snl0TkOPE+Ep0KTKhrpbvf5e7D3X04cCsw3d3LzawQ+A4wyt2HAKnAlXGutU6DunUgNcVYuHFXUCWISAsV1xB193eA8gg3nwxMq/FzGtDOzNKATGBjjMuLmJkxcUg3Xp6vq/QicrQWcU7UzDIJHbE+DeDuG4C7gbXAJmCnu78eXIUwcUhXXlmwWUN6ETlKiwhR4CJgpruXA5hZJ2AS0BfoDmSZ2dW17WhmN5hZiZmVlJWVxa3Ak3rkcLCymqVbdsftPUQk8bSUEL2So4fy5wCr3L3M3Q8BzwBjatvR3R9091HuPio/Pz9uBZoZE4Z05ZX5ukovIv8SeIiaWQ4wHni+xuK1wGlmlmlmBpwNLA6ivpouGKqnl0TkaGnxfHEzmwacCeSZ2XrgNiAdwN3vD292KfC6ux9p3OnuH5jZU8AcoBKYCzwYz1ojMaJnJ3buP8Ty0j0UF2QHXY6ItADWmi6UjBo1yktKSuL6Hrc9v4D89hl866z+cX0fEWk5zGy2u4+qbV3gw/lEM2FIN17R00siEqYQjdLovp3ZsusAa7ftC7oUEWkBFKJRSk0xzlN7PBEJU4g2wsQhXXlZQ3oRQSHaKKf1y2Xttr1s2LE/6FJEJGAK0UZIT03hnBO7qD2eiChEGyvUY1TnRUWSnUK0kcYU57J0825Kdx0IuhQRCZBCtJEy0lI5+8QuvLZQQ3qRZKYQbYJQx3uFqEgyU4g2wfgT8lmwcSfb9lQEXYqIBEQh2gRt01MZf0I+ry/aEnQpIhIQhWgTnXFCPh+s3BZ0GSISEIVoExUXZLOibG/DG4pIq6QQbaKi/GxWlu3R3EsiSUoh2kQ57dLJzEhj007dLyqSjBSiMVCcn82Ksj1BlyEiAVCIxkBRQRYrShWiIslIIRoDRfnZLNeRqEhSUojGQHFBNitKdYVeJBkpRGNAR6IiyUshGgPdctqyr6KSnfsPBV2KiDQzhWgMmBn9dIVeJCkpRGMkdF5UISqSbBSiMVKUn6XzoiJJSCEaI0X5ukIvkowUojESakSiI1GRZKMQjZHeuVls2LGfisqqoEsRkWakEI2RNmkp9OjYjrXb9gVdiog0I4VoDPXLz2a5rtCLJBWFaAzpvKhI8lGIxlBRfpaOREWSjEI0hjRViEjyUYjG0OFHP6urNVWISLKIa4ia2RQzKzWzBXWs/6GZzQt/LTCzKjPrHF7X0cyeMrMlZrbYzE6PZ62xkNMunayMNDbv0lQhIski3keiU4EJda1097vcfbi7DwduBaa7e3l49e+AV919IDAMWBznWmOiWFfoRZJKXEPU3d8ByhvcMGQyMA3AzHKAM4CHwq9z0N13xKPGWCsqyNIVepEk0iLOiZpZJqEj1qfDi/oCZcCfzWyumf3JzLICKzAKOhIVSS4tIkSBi4CZNYbyacBI4A/uPgLYC9xS245mdoOZlZhZSVlZWfNUW48i3SsqklRaSoheSXgoH7YeWO/uH4R/fopQqB7H3R9091HuPio/Pz/OZTZMtzmJJJfAQzR8/nM88PzhZe6+GVhnZgPCi84GFgVQXtS6dtBUISLJJC2eL25m04AzgTwzWw/cBqQDuPv94c0uBV5392MP374NPGZmbYCVwLXxrDVWak4VMrJXp6DLEZE4i2uIuvvkCLaZSuhWqGOXzwNGxbyoZlBcELq4pBAVaf0CH863RkX5us1JJFkoRONAk9aJJA+FaBwU5esKvUiyUIjGgaYKEUkeCtE4ODxVyBpNFSLS6ilE40RThYgkB4VonOjikkhyUIjGiW5zEkkOCtE4KS7IZrlCVKTVU4jGSVFBNivL9mqqEJFWTiEaJx3appOdkcYmTRUi0qpF/Oy8mWUAnwf61NzP3e+IfVmtQ1F+6OJSYcd2QZciInESzZHo88AkoJJQk+TDX1KHogLNQy/S2kXTxamHu9c56Zwcrzg/m2UKUZFWLZoj0ffMbGjcKmmFigoiu+He3XUBSiRBRROi44DZZrbUzD4xs/lm9km8CmsNIpkqpHTXAS65dyb3/HN5M1UlIrEUzXB+YtyqaKW6dmjL/oOV7Nx3iJzM9OPWL960i689XMJJPXJ4a2kpN5/TP4AqRaQpGjwSNbMO4W931/EldTCz0JC+lpvu31pSytV/+oAfTxzI/3xxOMu27GbXAc3LJJJoIjkSfRy4EJgNOGA11jnQLw51tRpF4fmWTu79r6lCHpm1mnv+uZwH/23UkeUjenXi/RXbOG9w16BKFZFGaDBE3f3C8J99419O61OUn3WkEUlVtXPni4uYsXwrT984hl65mUe2G1Ocy3sKUZGEE9VEdWbWCegPtD28zN3fiXVRrUlxQTZPzV7PnopKvjNtLhWVVTz9jTHktDv6HOnYojx+8LePA6pSRBormieWvgbcDPQA5gGnAbOAs+JSWStRlJ/Ngg27uPz+WQzrkcOdlwwhPfX4U9FDCnMo3V1B6a4DFHRoW8sriUhLFM0tTjcDpwBr3P2zwAhgRzyKak1652ax+8AhLhnenf+6bGitAQqQmmKc1q8zM1dsbeYKRaQpohnOH3D3A2aGmWW4+xIzGxC3ylqJNmkpzP7pubRNT21w27HFecxcvo1LR/RohspEJBaiORJdb2YdgeeAN8zseWBNPIpqbSIJUAiF6HvLt+Kup5dEEkXER6Lufmn429vN7C0gB3g1LlUlqX55WVQ7rN62j755WUGXIyIRiOhI1MxSzWzJ4Z/dfbq7v+DuB+NXWvIxM8YU5zJjuc6LiiSKiELU3auApWbWK871JL2xRaEhvYgkhmguLHUCFprZh9ToI+ruF8e8qiQ2tjiPO19aRHW1k5JiDe8gIoGKJkR/Grcq5IiuOW3JzWrDok27GFKYE3Q5ItKAaK7OXxA+F3rkC7ggXoUls7HFeTovKpIgognRc2tZpvZ4cTCmKI+ZClGRhBBJK7xvmNl8YEC4GfPhr1WAmjLHwen9cpm7dgcVlVVBlyIiDYi0Fd4rwH8Bt9RYvtvdyw//YGad3H17jOtLSjmZ6RTlZzF37Q5O65cbdDkiUo8Gj0Tdfae7r3b3ye6+psZX+TGbvnnsvmY2xcxKzWxBba9tZj80s3nhrwVmVmVmnWusTzWzuWb2YtR/swQ3plhDepFEEM050YbUdj/OVKDOGULd/S53H+7uw4FbgenHhPPNwOIY1pgwxkZ5XrR870EOHNLwX6S5xTJEj3vgO9xr9Ngj1rpMBqYd/sHMegCfA/4Uk+oSzKg+nVi6eTe7I5gyZPeBQ1x630xue35hM1QmIjXFMkQbzcwyCR2xPl1j8f8CPwKqg6gpaG3TUxnWsyMfrqr//yB355Zn5nNyr078c2kpn6zf0TwFiggQ/+F8pC4CZh4eypvZhUCpu89u8E3NbjCzEjMrKSsra0IJLU8k94s+9sFaVpTu4ZeXDeWH5w3g9hcWag57kWYUVYia2Tgzuzb8fb6Z1Zx36ewm1HElNYbywFjgYjNbDTwBnGVmf6ltR3d/0N1Hufuo/Pz8JpTQ8owpyuW95dvqXL9o4y5++8an3HvVSNqmp/KFk3tQWe08N29DM1YpktwiDlEzuw34MaELQADpwJFgq+VqfaSvmwOMB56v8Vq3unsPd+9DKGD/6e5XN+b1E9nQwhw27dxP2e6K49btqajkW4/P4WcXDqIoPxuAlBTj9osH86tXl7CnorK5yxVJStEciV4KXEy4+Yi7bwTa17eDmU0jNA/TADNbb2bXmdmNZnbjMa/7urvvrf1Vkldaagqn9svlvWOmDHF3fvLsfE7p05lLRhQetW5kr06MK87nnn8ua85SRZJWNCF60EMt1x3AzBrsGhy+t7Sbu6eHjywfcvf73f3+GttMdfcr63mNtw9P25yMxhblHner05Ml61i0aRe3Xzy41n1+PGEAT360jlVb9f+SSLxFE6JPmtkDQEczux74B/DH+JQlhx2ed+nwlCFLN+/mV68u5d4vjaRdm9qnHSno0JZvnFnEnS8uas5SRZJSxCHq7ncDTxG6DWkA8DN3vydehUlIcUE2h6qqWVu+j30HK7np8TncOnEg/bvUeyaFr4zpy+qte/nnki3NVKlIcopm3vksQhd43gjP8jnAzNLdveG7waXRzOzI0eictds5qUcOl4/q2eB+bdJS+NlFg/j53xcxtjiPjLTIJssTkehEM5x/B8gws0JCE9R9mdBjnRJnY4py+b83lzF37XbunDQk4v3OHFBAUX4Wf565On7FiSS5aELU3H0fcBnwB3e/HKj9yobE1Lj+eew/VMW9V40kKyOayQjgJ58bxAPTV7Bl14E4VSeS3KIKUTM7HbgKeCm8TGPEZtAtpx0lPzmHgV07RL1vn7wsJo/uxa9eWdLwxiIStWhC9LuEbrR/1t0Xmlk/4K24VCXHSU9t/BO6N322mPdWbGP2mkY9DyEi9Yjm6vx0d7/Y3X8V/nmlu38nfqVJrGRlpHHLxIH8v5eSsqugSFxF89jnKDN7xszm1JwmJJ7FSexcNKw7G3fsZ3np7qBLEWlVorlK8RjwQ2A+SdqeLpGlphgXndSd5+Zu5AfnDwi6HJFWI5oTbWXu/oK7r6o5TUjcKpOYu2REIc9/vOHI008i0nTRHIneZmZ/IjSX0pG2Qu7+TMyrkrgY3L0DbVJTmLN2Oyf37tzwDiLSoGhC9FpgIKEWeIeH8w4oRBOEmXHJ8EKem7tRISoSI9GE6CnurpNpCW7S8EIuvW8mP7toUJNumxKRkGh+i94zs0Fxq0SaRa/cTHrnZvLustY1lYpIUCIKUTMzQt3n55nZ0vDtTfN1i1NiumREIc/P2xh0GSKtQkTDeXd3MysA+se5HmkGnxvajbteW8reisqon8UXkaNFM5x/GiioeXuTbnFKTLnZGYzq3Yk3FqnXqEhTRROipwKzzGyFhvOJ75IRhZoVVCQGohnLnR+3KqTZnTuoCz95bgFb91SQl50RdDkiCSuaBiRravuKZ3ESP5lt0jh7YAEvfbIp6FJEEppuFExikzSkF2kyhWgS+0xxHuvK97Fmm6ZWFmkshWgSS0tN4XNDu+meUZEmUIgmuYuHh4b06uwk0jgK0SQ3sldHKqucBRt2BV2KSEJSiCY5M2PS8O66wCTSSApRYdLwQv7+8UaqqjWkF4mWQlQoLsimoEMGs1ZsC7oUkYSjEBWAULNmDelFoqYQFSA0G+jrCzdz4FBV0KWIJBSFqADQpUNbRvftzBUPzOLRWavZvvdg0CWJJAQ1k5Qj7r/6ZN5dvpVn5mzg168uZUxxLpeN7MFnBxTQJk3/34rUxlrTTdajRo3ykpKSoMtoFXYdOMQr8zfx9JwNLC/dw4UndeOykT0Y1iOH0EQHIsnDzGa7+6ha18UzRM1sCnAhUOruQ2pZ/0PgqvCPacCJQD6QBTwCdCE0o+iD7v67ht5PIRof68r38ezcDTwzZz3dO7bjka+OJk2T3EkSqS9E4/2bMBWYUNdKd7/L3Ye7+3DgVmC6u5cDlcD33X0QcBpwkybJC07Pzpl85+z+/PP7Z5KaYtz39oqgSxJpMeIaou7+DlAe4eaTgWnh/Ta5+5zw97uBxUBhXIqUiKWkGHd9YRiPzFrNx+t2BF2OSIvQIsZkZpZJ6Ij16VrW9QFGAB80c1lSi645bbn94sH8+1/nse9gZdDliASuRYQocBEwMzyUP8LMsgkF63fdvdYOGWZ2g5mVmFlJWZnmUm8OF57UnWE9O/LLlxcHXYpI4FpKiF5JeCh/mJmlEwrQx9z9mbp2dPcH3X2Uu4/Kz8+Pc5ly2M8nDeatJWW8tbQ06FJEAhV4iJpZDjAeeL7GMgMeAha7+2+Dqk3q1qFtOndfPoxbnv6Ect2YL0ksriFqZtOAWcAAM1tvZteZ2Y1mdmONzS4FXnf3mnNUjAW+DJxlZvPCXxfEs1aJ3ulFuUwaXsitz3yips6StHSzvTRJRWUVk34/k6+O68sVo3oGXY5IXAR5n6i0chlpqfzvlcP571eWsHbbvqDLEWl2ClFpsoFdO/CN8UV878l5auwsSUchKjFx3bi+pKemcP90Pc0kyUUhKjGRkmLcfcUwpsxYxZMfrdOFJkkaClGJmcKO7Xj0ulP5ywdr+NIfP2DV1r0N7ySS4BSiElODunfgmW+M4ewTC7jsvpnc+9ZyDlVVB12WSNwoRCXm0lJT+Npn+vHCt8bx4apyLrpnBnPXbg+6LJG4UIhK3PTsnMnUa0/hG2cWccOjs7n9hYXsqVDTEmldFKISV2bGpOGFvPHvZ7C3opLz/+edZpmaeee+Q3F/DxFQiEoz6ZjZhrsuH8aPJgzgv1+JX/enA4eq+PnfFzLiztd5b8XWuL2PyGEKUWlWE4d0Y0XZ3rgcKS7YsJOL7plB6e4KfnPFMH701Cc6fSBxpxCVZtUmLYVRfToxa2XsjhKrqp373l7ONVM+5KbPFvP7ySO4dEQPxhTl8ouX1PNU4kshKs1uXHEe7y6LTYiuK9/HlQ/O4p1Py3jh2+O4ZEThkdlIf3rhIN75tIy31fNU4kghKs1ubHEeM5c3LUTdnb+VrGPSvTM5b1BXHv/aaRR2bHfUNu3bpvPrL5zErc/M14UmiRuFqDS7gV3bs6eiinXljev6tHPfIW78y2wemrGKx68/levP6EdKitW67djiPM4d1IWf/31hU0oWqZNCVJqdmTG2OLfRR6P3vr2cjLRUnv/WWAZ27dDg9rdMHMjstdt5beHmRr2fSH0UohKIscV5zGhEiLo7L32yiW+cWURGWmpE+2S2SePuy4fxk+cWsG1PRdTvKVIfhagEYmxxHu+t2EZ1lP1HP1m/k4y0FAZ2bR/Vfqf06cylIwr56fMLYtZh6uN1O9i880BMXksSl0JUAlHYsR0d26WzeHOtM2HX6eX5m5g4tOuRK/DR+N65J/Dplj38/ZNNUe97rL0VlXztkRJ+oWmjk55CVAIzrn8eM6K41cndeXnBJi4Y2q1R79c2PZXfXD6MO/6+kNJdTTuCfOCdlQzv2ZEZy8oafYFMWgeFqAQm2vOiCzbsItWMQd0avphUl2E9OzJ5dC9ufWZ+o4f1m3ce4JFZq7ntokFcOboXD81Y1eh6JPEpRCUwp/XLZc6a7Rw4VBXR9i/N38TEod0aNZSv6dtn9WfjzgP89aN1jdr/rteWMnl0L3p0yuTaMX14du4GyvcebFJNkrgUohKYnHbp9O/SnjkR9Bp1d16ev4nPNXIoX1ObtBR+d+Vwfv3aUpZt2R3Vvgs27GT6p2V888wiAAo6tGXikK48OmtNk+uSxKQQlUB9JsLzogs3hi5ADe7e+KF8TSd0ac8tEwZy0+Nz2H8wsiNhd+cXLy3mu+f0p33b9CPLrz+jH4++vzri15HWRSEqgYr0EdCmXJWvy+WjejCoWwdue2FBRNu/ubiUrXsquPKUnkctL8rPZmSvTjw1u3GnBySxKUQlUCN6dWywNV4sh/I1mRn/79KhlKzezrNz19e77aGqan75ymL+44ITSUs9/tfm6+P78cd3V1Gp+aSSjkJUApWRlsrJvetvjbd4024qq52hhTkxf//sjDR+/6WR3PniYlaU7alzu2kfrqV7TjvOHJBf6/qTe3emoH0Gr+rR0qSjEJXAfaZ//a3xXp4fujc0lkP5mgZ178D3zj2Bmx6bU+udAjv3H+L/3lzGf1xwYr01fH18EQ9MXxnxrVMHK6v50VMf8+biLY2uXYKnEJXA1Xde9PBQvrE32EfqqlN7UVSQzZ0vLjpu3X1vLeesgQUMauCi1tkDC9h3sJJZKxueQ+pQVTXfmTaXteX7+NFTn7Bgw85G136syqpqtu2pYHnpHmavKecfi7ZEfReCRC4t6AJEBnRpz56KStaV76Nn58yj1i3dspuKymqG9Yj9UL4mM+O/LxvKhffM4MVPNnLhSd2BUNPnv5as47XvntHga6SkGF8/I3Q0OqYor87tqqqd7z35MQcqq3j4q6N5a0kp1z9SwjPfHEO3nHZ17leb/QeruOPFhSzauIvt+w6xfd9B9h2sokPbNDpmtqFjZjrpqSnsO1jJi9/+TFSvLZFRiErgUlKMMUV5vLdiK1/s3OuodS9/sokLYnxVvi7t26bz+8kjuebPHzK0MIfeuVn86tUlXDumL106tI3oNSaN6M5v3ljK4k27OLGWJ6uqq50fPfUJ5XsreOiaU8hIS2XCkG6s3raP66aW8LcbTycrI7Jfy537D/G1hz+isGM7br94MJ3CodmhbfpR/VUrq6oZ/cs3Wb99Hz06ZdbzitIYGs5LizCulvOi7n7kKaXmMrRHDt85q5ibHp/D+yu3UbJ6O9ef0Tfi/TPSUrl2bF8efGflcevcnf98bj7ryvfxx38bRdv0f7Xy+/oZ/TipRw43PzGXqgg6W5XtrmDyg+8zuHsOv71iOCN6daJPXhYdM9sc16A6LTWFc04s4LWFOvcaDwpRaRFqa4336ZY97D9YxYieHZu1lmvG9KGwYzuumfIh3z/vBDLbRDdg+9KpvXhraSnrt/+rMYm7c/sLC1myeTdTrj3luNc0M+68ZAj7DlY1OLneuvJ9XPHALM4b3IXbLhpUZ1f/miYM6cprC3TnQDwoRKVFqK013ssxelY+WmbGrz8/jK+f0Y/LRvaIev8ObdO5YlRPpsxYDYQC9JcvL2bO2h1MvXY02XUM19NTU/jDVScz/dNSHp21utZtlm3ZzRUPzOKa03vz3XNOiPizGVOUx+LNuyjbrabUsRbXEDWzKWZWama1PhJiZj80s3nhrwVmVmVmncPrJpjZUjNbbma3xLNOaRnGFh/9CGjoqnzXQGrJyUzne+cNIDWCo7zaXDu2D0/PWc/OfYf47RufMmP5Nh69bjQ57dLr3S8nM50/f2U0//fP5cfNUjpv3Q4m//EDfjxhIF8ZG/kpBgi1ARx/Qj7/0O1UMRfvI9GpwIS6Vrr7Xe4+3N2HA7cC09293MxSgXuBicAgYLKZDYpzrRKwcf3/1Rpv2Zbd7D5QyYienQKuqnG65bTj3EFduOqh93l1wWb+ct1oOma2iWjfXrmZ3H/1SL7/5McsCR+Zz1i2leumfsSvPj+US0YUNqqmCUO68qqG9DEX1xB193eA8gg3nwxMC38/Glju7ivd/SDwBDApDiVKC1KzNd7L8zczcWjXiM73tVQ3ji8iOyONx64/ldzsjKj2Pbl3Z3520SCum1rCYx+s4eYn5nLfVSM5+8Quja7nzAEFzF6znZ37g5k+urraYzY1S0vSIs6JmlkmoSPWp8OLCoGa3RzWh5dJK1azNV5z3GAfb8UF2Txxw+kUtI/s9qhjTRpeyBdP6cn/vPEpD391NKf2y21SPdkZaZzWrzNvLSlteOM4+OnzC/ivV5YE8t7x1CJCFLgImOnukR61HmFmN5hZiZmVlJWVxaE0aU7jivN4+L3V7Nh/kJN7JeZQPpa+c3Z/Zt5yFkNi1Dfg/MFdA5k6el35Pl78ZBN//WgdO/a1rgbWLSVEr+RfQ3mADUDNfmM9wsuO4+4Puvsodx+Vn197cwhJHOP65/Hawi1MHNItoYfysRTp1NCROOfELsxYtrXZe5/e9/YKrj6tF2efWMBjH6xt1veOt8BD1MxygPHA8zUWfwT0N7O+ZtaGUMi+EER90rxG9OpIZptUJg4J5qp8a9cpqw1De+TwzrLmG7Vt3LGfl+dv4rpx/bjhjH5MfW81FZWtp4F1vG9xmgbMAgaY2Xozu87MbjSzG2tsdinwurvvPbzA3SuBbwGvAYuBJ919YTxrlZYhIy2Vf3xvPKP7dg66lFaruW+8v3/6Cq48pSeds9owsGsHBnXrwHNzax1YJqS4Pjvv7pMj2GYqoVuhjl3+MvBy7KuSlq57x+iacEh0zhvUld+8/ikHK6tpkxbZcdTWPRXsraikd25WVO+1ZdcBnp+3kX98b/yRZTec0Y+fPb+Ay0/u2SpO2QQ+nBeR5tU1py1987J4P4KWfRDqOnXjo7O58sH3o74o9MD0lXx+ZA/y2//rFq8xRbm0TU/lraXB3CUQawpRkSQ0YUjkV+kfmrGS1BRjwpCu/PCpTyK+17NsdwVPz1nP18f3O2q5mXHDGf14oJYmLYlIISqShM4f3JXXF21psGPUsi27uX/6Su6+fBi3TBzI5p0HeCTC6aH/9O5KJg3vXmsbwQuGdmPD9v3MW7ejMeW3KApRkSTUNy+L3Kw2zF27vc5tDlVV8/2/fcwPzhtAz86ZZKSlcs/kEfzuzWUs3Fh/J/7yvQd54qN13Di+qNb16akpfHVcX/7YCo5GFaIiSer8wfU/S/+Ht1fQMbMNk0f/65btPnlZ3HbRIL49bS57Kyrr3PehGSu5YGi3ei8SfvGUnry3Yitrt+2rc5tEoBAVSVLnD+7Kqws313qOc8GGnTz83mp+9fmhx7XbmzS8kJG9OnHbC7Xfdbhz3yEe+2At3zyz9qPQw7Iz0rhydC8empHYR6MKUZEkdWK39qSYsWjTrqOWV1RW8YO/fcx/fu7EOud8+vnFg5mzdnut93tOmbmKc0/sctx8WbX5ypg+PDt3A9v3Ju6joApRkSRlZrXeeP+7fyyjV+dMLq2n5V5WRhr3TB7BHS8uYvXWI8/JsOvAIR59fw03fbY4ohq6dGjL+YO78pf3I7tY1RIpREWS2OEh/WFz1m7nyZL1/OLS44fxxxrcPYfvntOfb02bc+QxzkfeW834E/Lpkxf5TfnXn9GPh2et4cChxHwUVCEqksRG9OzIjn2HWFEWms/qB09+zB2TBh91c3x9vnxab7rntOPXry5lT0Ulf565OuKj0MNO6NKeoYUdeLaZHgV9beFm1pXH7mKWQlQkiaWkGOcN7sJrCzdz12tLGVKYE1UfVzPj1184iVcXbObmaXM5vSiX4oLsqOu44Ywi/vjuyqMmKoyHA4eq+M9n51NRWR2z11SIiiS5CYO7MXXmal6av5E7Jg2Oev+OmW343yuH8+6yrXz7rP6NquG0fp3JzkirdQ6oisoq1m/fx+w123lt4eYmdeZ/4eONDO6e06igr0tcG5CISMt3ar/OmMEvLx0a8TxQxzqlT2fm/OzcOmcybYiZcf1n+vGrV5fw2sItlO4+QOmuCkp3H2BPRSV52RkUtM+gstp5Y9EW7r58WNTv4e5MmbGKWy84sVE11kUhKpLk0lNTmPHjs0hPbdrAtLEBetjEIV3ZdeAQ6Skp5HfIoEv7thR0yKBzZpsj3Z527jvE+LvfYl35vohuoarp/ZXlHKqq5oz+eU2q81gKURFpcoDGQlpqCled2rvebXIy07nq1F7c9/YK/uuyoVG9/pSZq7h2bN8G7zqIVvCfnIhIFK4b14+X529i4479Ee+zZtteSlaXc9nI2M93qRAVkYTSOasNXzylJw9MXxHxPg+/t4YrTulJZpvYD74VoiKScL72mb48N28jpbsONLjt7gOHeHrOev7t9D5xqUUhKiIJp6B9Wy4dURhRY+enZq9nXHEehXGadkYhKiIJ6cbxRTw1ez1b91TUuU1VtTP1vdV8dVyfuNWhEBWRhNQ1py0XntSNP727qs5t3lpSSsd26Yzs1SludShERSRhfePMIp74aG2drfTidVtTTQpREUlYPTplcv6grvx55vFHo4s37WJF2Z6oegE0hkJURBLaNz9bxKPvr2HXgaOfqZ86czVXn9qbNmnxjTmFqIgktN65WXx2QAEPz1x9ZNm2PRW8smATXzq1V9zfXyEqIgnvm58tZup7q9kTnjxv2odrmTikG7nZkfVFbQqFqIgkvOKCbE4vyuUv76/hYGU1j76/hmvjeFtTTWpAIiKtwrfOKubqP31ITrt0ivKzGdi1Q7O8r45ERaRVGNi1Ayf37shtzy/kq2P7Ntv76khURFqNm88+gd0HKjlrYEGzvadCVERajUHdO/D49ac163tqOC8i0gQKURGRJlCIiog0QVxD1MymmFmpmS2oZ5szzWyemS00s+k1lv97eNkCM5tmZm3jWauISGPE+0h0KjChrpVm1hG4D7jY3QcDl4eXFwLfAUa5+xAgFbgyzrWKiEQtriHq7u8A5fVs8iXgGXdfG96+tMa6NKCdmaUBmcDGuBUqItJIQZ8TPQHoZGZvm9lsM/s3AHffANwNrAU2ATvd/fUA6xQRqVXQIZoGnAx8Djgf+KmZnWBmnYBJQF+gO5BlZlfX9gJmdoOZlZhZSVlZWXPVLSICBB+i64HX3H2vu28F3gGGAecAq9y9zN0PAc8AY2p7AXd/0N1Hufuo/Pz8ZitcRASCD9HngXFmlmZmmcCpwGJCw/jTzCzTQn39zw4vFxFpUeL62KeZTQPOBPLMbD1wG5AO4O73u/tiM3sV+ASoBv7k7gvC+z4FzAEqgbnAg/GsVUSkMczdg64hZsysDFgT5W55wNY4lBNriVInJE6tqjP2EqXWaOvs7e61ni9sVSHaGGZW4u6jgq6jIYlSJyROraoz9hKl1ljWGfQ5URGRhKYQFRFpAoVo4lywSpQ6IXFqVZ2xlyi1xqzOpD8nKiLSFDoSFRFpgqQOUTObYGZLzWy5md0SdD11MbPVZjY/3DKwJOh6aqqt3aGZdTazN8xsWfjPTkHWGK6ptjpvN7MN4c91npldEGSN4Zp6mtlbZrYo3Ary5vDyFvWZ1lNni/pMzaytmX1oZh+H6/x5eHlfM/sg/Lv/VzNr0+j3SNbhvJmlAp8C5xJ6/PQjYLK7Lwq0sFqY2WpCbQFb3P13ZnYGsAd4JNy2EDP7NVDu7v8d/s+pk7v/uAXWeTuwx93vDrK2msysG9DN3eeYWXtgNnAJ8BVa0GdaT51X0II+0/ATj1nuvsfM0oEZwM3A9wh1kHvCzO4HPnb3PzTmPZL5SHQ0sNzdV7r7QeAJQk1PJAp1tDucBDwc/v5hQr9cgYqgLWOL4O6b3H1O+PvdhB53LqSFfab11NmieMie8I/p4S8HzgKeCi9v0ueZzCFaCKyr8fN6WuA/gjAHXg+3C7wh6GIi0MXdN4W/3wx0CbKYBnzLzD4JD/cDP+1Qk5n1AUYAH9CCP9Nj6oQW9pmaWaqZzQNKgTeAFcAOd68Mb9Kk3/1kDtFEMs7dRwITgZvCQ9OE4KHzRS31nNEfgCJgOKG+tb8JtJoazCwbeBr4rrvvqrmuJX2mtdTZ4j5Td69y9+FAD0Ij0IGxfP1kDtENQM8aP/cIL2txwk2qD3f+f5bQP4SWbEv4nNnhc2elDWwfCHffEv4Fqwb+SAv5XMPn7p4GHnP3Z8KLW9xnWludLfUzBXD3HcBbwOlAx/CsGdDE3/1kDtGPgP7hq3RtCM3h9ELANR3HzLLCJ+4xsyzgPKDOif9aiBeAa8LfX0Oo5WGLcziUwi6lBXyu4QshDwGL3f23NVa1qM+0rjpb2mdqZvkWmssNM2tH6ELyYkJh+oXwZk36PJP26jxA+PaL/yU0Ed4Ud/9FsBUdz8z6ETr6hFDrwsdbUp012x0CWwi1O3wOeBLoRair1hXuHuhFnTrqPJPQsNOB1cDXa5x3DISZjQPeBeYTag8J8B+Ezje2mM+0njon04I+UzM7idCFo1RCB41Puvsd4d+rJ4DOhFptXu3uFY16j2QOURGRpkrm4byISJMpREVEmkAhKiLSBApREZEmUIiKiDSBQlSkDmZ2ppm9GHQd0rIpREVEmkAhKgnPzK4O94ycZ2YPhBtO7DGz/wn3kHzTzPLD2w43s/fDDTKePdwgw8yKzewf4b6Tc8ysKPzy2Wb2lJktMbPHwk/qiByhEJWEZmYnAl8ExoabTFQBVwFZQIm7DwamE3pCCeAR4MfufhKhp20OL38MuNfdhwFjCDXPgFB3ou8Cg4B+wNg4/5UkwaQ1vIlIi3Y2cDLwUfggsR2h5hzVwF/D2/wFeMbMcoCO7j49vPxh4G/h3gSF7v4sgLsfAAi/3ofuvj788zygD6HGviKAQlQSnwEPu/utRy00++kx2zX2+eaaz1NXod8ZOYaG85Lo3gS+YGYFcGQuot6E/m0f7tLzJWCGu+8EtpvZZ8LLvwxMD3dmX29ml4RfI8PMMpvzLyGJS/+rSkJz90Vm9hNCnf9TgEPATcBeYHR4XSmh86YQant2fzgkVwLXhpd/GXjAzO4Iv8blzfjXkASmLk7SKpnZHnfPDroOaf00nBcRaQIdiYqINIGOREVEmkAhKiLSBApREZEmUIiKiDSBQlREpAkUoiIiTfD/Ab6ZuC40oSk7AAAAAElFTkSuQmCC",
                        "text/plain": [
                            "<Figure size 360x360 with 1 Axes>"
                        ]
                    },
                    "metadata": {
                        "needs_background": "light"
                    },
                    "output_type": "display_data"
                }
            ],
            "source": [
                "with Timer() as train_time:\n",
                "    model_100k.fit(Xtr_100k)\n",
                "\n",
                "print(\"Took {:.2f} seconds for training.\".format(train_time.interval))\n",
                "\n",
                "# Plot the train RMSE as a function of the epochs\n",
                "line_graph(values=model_100k.rmse_train, labels='train', x_name='epoch', y_name='rmse_train')"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 18,
            "metadata": {},
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": [
                        "Took 0.20 seconds for prediction.\n"
                    ]
                }
            ],
            "source": [
                "#Model prediction on the test set Xtst.\n",
                "with Timer() as prediction_time:\n",
                "    top_k_100k =  model_100k.recommend_k_items(Xtst_100k)\n",
                "\n",
                "print(\"Took {:.2f} seconds for prediction.\".format(prediction_time.interval))\n",
                "\n",
                "#to df\n",
                "top_k_df_100k = am100k.map_back_sparse(top_k_100k, kind = 'prediction')\n",
                "test_df_100k = am100k.map_back_sparse(Xtst_100k, kind = 'ratings')"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "### 4.2.1 Model evaluation "
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 19,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>Dataset</th>\n",
                            "      <th>K</th>\n",
                            "      <th>MAP</th>\n",
                            "      <th>nDCG@k</th>\n",
                            "      <th>Precision@k</th>\n",
                            "      <th>Recall@k</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>mv 100k</td>\n",
                            "      <td>10</td>\n",
                            "      <td>0.143607</td>\n",
                            "      <td>0.412962</td>\n",
                            "      <td>0.338494</td>\n",
                            "      <td>0.214506</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "   Dataset   K       MAP    nDCG@k  Precision@k  Recall@k\n",
                            "0  mv 100k  10  0.143607  0.412962     0.338494  0.214506"
                        ]
                    },
                    "execution_count": 19,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "eval_100k= ranking_metrics(\n",
                "    data_size = \"mv 100k\",\n",
                "    data_true =test_df_100k,\n",
                "    data_pred =top_k_df_100k,\n",
                "    K=10) \n",
                "\n",
                "eval_100k"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": []
        }
    ],
    "metadata": {
        "interpreter": {
            "hash": "67434505f7f08e5031eee7757e853265d2f43dd6b5963eb755a27835ec0e1503"
        },
        "kernel_info": {
            "name": "python3"
        },
        "kernelspec": {
            "display_name": "tf37",
            "language": "python",
            "name": "python3"
        },
        "language_info": {
            "codemirror_mode": {
                "name": "ipython",
                "version": 3
            },
            "file_extension": ".py",
            "mimetype": "text/x-python",
            "name": "python",
            "nbconvert_exporter": "python",
            "pygments_lexer": "ipython3",
            "version": "3.7.12"
        },
        "nteract": {
            "version": "0.12.3"
        }
    },
    "nbformat": 4,
    "nbformat_minor": 4
}